il l 111 111 I I II I III I It 11 1 

US006571013B1 

(12) United States Patent ao) Patent No.: us 6,571,013 Bi 

Macey et al. (45) Date of Patent: *May 27, 2003 



(54) AUTOMATIC METHOD FOR DEVELOPING 
CUSTOM ICR ENGINES 

(75) Inventors: Garrett N, Macey, RockviUe, MD 

(US); Ivan N. Bella, Gaithersburg, MD 
(US) 

(73) Assignee: Lockhead Martin Mission Systems, 
Gaithersburg, MD (US) 

( * ) Notice: This patent issued on a continued pros- 
ecution application filed under 37 CFR 
1.53(d), and is subject to the twenty year 
patent term provisions of 35 U.S.C. 
154(a)(2). 

Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 

(21) Appl. No.: 08/664,221 

(22) Filed: Jun. 11, 1996 

(51) Int. CI. 7 G06K9/00 

(52) U.S. CI 382/181; 382/187 

(58) Field of Search 382/181, 190, 

382/192, 201, 209, 159, 226, 160, 161; 

235/435 

(56) References Cited 

U.S. PATENT DOCUMENTS 



5,054,094 A * 10/1991 Barski 382/192 

5,317,652 A * 5/1994 Chatterjee 382/304 

5,321,773 A * 6/1994 Kopec et al 382/159 

5,526,444 A * 6/1996 Kopec et al 382/233 

5,594,809 A * 1/1997 Kopec 382/161 



OTHER PUBLICATIONS 

Mori et al, "Historical Review of OCR Research and 
Development", Proceedings of the IEEE, vol. 80, No. 7, 
7/92, pp. 1029-1057. 



Patrenahalli et al., "A Branch and Bound Algorithm for 
Features Subset Selection", IEEE Transactions on Comput- 
ers, vol. C-26, No. 9, 9/77, pp917-922. 
Okada et al., "An Optimal Orthonormal System for Dis- 
criminant Analysis", Pattern Recognition, vol. 18, No. 1, 
1985, pp. 139-144. 

Riccia et al., "Fisher Discriminant Analysis and Factor 
Analysis", IEEE Transactions on Pattern Analysis and 
Machine Intelligence, vol. PAMI-5, No. 1, Jan. 1993, pp. 
99-104. 

(List continued on next page.) 
Primary Examiner — Samir Ahmed 

(74) Attorney, Agent, or Firm— Venable LLP; Andrew C. 
Aitken 

(57) ABSTRACT 

A computer automated feature selection method based upon 
the evaluation of hyper-rectangles and the ability of these 
rectangles to discriminate between classes. The boundaries 
of the hyper-rectangles are established upon a binary feature 
space where each bit indicates the relationship of a real 
feature value to a boundary within the minimum and maxi- 
mum values for the feature across all samples. Data reduc- 
tion combines the binary vector spaces so that the number of 
samples within a single class is within a range which is 
computationally feasible. Identification of subclasses iden- 
tifies maximal subsets of S + which are exclusive against S". 
Feature evaluation determines within a single subclass the 
contribution of each feature towards the ability to discrimi- 
nate the subclass from S". The base algorithm examines each 
feature, dropping any feature which does not contribute 
towards discrimination. A pair of statistics are generated for 
each remaining feature. The statistics represent a measure of 
how many samples from the class are within the subclass 
and a measure of how important each feature is to discrimi- 
nating the subclass from S". The values for each subclass are 
then combined to generate a set of values for the class. These 
class feature metrics are further merged into metrics evalu- 
ating the features contribution across the entire set of 
classes. Feature reduction determines which features con- 
tribute the least across the entire set of classes. 

18 Claims, 5 Drawing Sheets 
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AUTOMATIC METHOD FOR DEVELOPING 
CUSTOM ICR ENGINES 

This invention was made with Government support 
under contract MDA-904-92-C-M300 awarded by the 
Department of Defense. The Government has certain rights 
in this invention. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates to a computer automated method 
for creating image recognition engines optimized for a given 
language and/or source; and more particularly to a compu- 
tationally efficient method for selecting, from a large uni- 
verse of features, subsets of features that optimize recogni- 
tion accuracy against a target data set. While not limited 
thereto, the invention will be explained in its preferred 
context for use in recognition of hand printed or machine 
printed characters. 

2. Description of the Prior Art 

As will be appreciated by those skilled in the art, each new 
language studied (and research in character recognition 
generally), proposes new character features to address spe- 
cific problems or attributes of the source data set. However, 
the addition of new features is not necessarily sufficient for 
improved performance of a character recognition engine. If 
too many features are used, the rate of successful character 
discrimination actually drops as more features are added, a 
phenomena known in the art as peaking. 

Each language (Japanese, Thai, English, et aL) has unique 
characteristics. Traditionally image character recognition 
(ICR) engines have been developed that make use of a 
feature set which was hand developed to support the lan- 
guage. The quality of the ICR is directly related to the 
amount of research effort applied to the language. High 
quality ICR engines exist where large markets support a 
large investment in research and development. An auto- 
mated method that selects feature subsets would allow 
inexpensive development of ICR engines that perform well 
against languages that economically would not support a 
large investment. 

Then, too, mixed languages can present unique problems 
as an OCR engine may need different features to distinguish 
between the two (or more) languages than the feature set 
used to best recognize either language individually. An 
automated feature selection tool would generate a feature set 
that is tailored to handle the particular mix of languages 
involved. 

The following is a definition of certain of the terms used 
herein: 

Class: All samples of a given codepoint (character) within 

the training data set. 
Subclass: The set of samples within a single class which 

share some common attributes). 
Codepoint: The representation to a machine of what people 

would recognize as a character. 
Feature Vector: The set of measurements for the feature 

universe for a single sample ( v ) 
S + : The class under consideration. 

S": The set of all classes within the problem space other than 

the codepoint uoder consideration (S + ). 
Exclusive: Used to describe a binary vector. Exclusive 

indicates that the binary vector is distinct from all vectors 

within S" (eq. 1). 



10 



15 



where: 

A is the "and" operator, 
V is the "all" operator, and 
e means exists. 

G: A subset of S + , also known as a "view" 

a(G): The binary vector resulting from the logical conjunc- 
tion of the samples represented by G. 

0(S*,S~): The collection of all G subsets in S + such that 
a(G) is exclusive against all vectors in S~. 

Q(S*,S~): The collection of all maximal subsets in 0(S + ,S~) 
where maximal is defined as every H in &(S*,S~) such 
that if ot(H) is not exclusive against some a(G) in €>(S + , 
S") then H=G. 

MAX: Number of samples within the largest subclass occur- 
ring in Q(S + ,S") (eq. 2). 
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where: |G| is the number of elements io set G. 
AVE: The average number of samples within the subclasses 
occurring within Q(S + ,S~) (eq. 3). 



Ave{Cl)= i£ M where M = |fl| 
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SUMMARY OF THE INVENTION 

An object of this invention is the provision of a computer 
automated method to select a subset of features in order to 
limit the number of features used by the character recogni- 
tion engine, while optimizing its ability to discriminate 
among characters. That is, a method that narrows the number 
of features to those features that provide best within class 
consistency of recognition and the best cross-class discrimi- 
nation. 

Another object of the invention is the provision of a 
computer automated method of feature selection that is 
suitable for use with a large number of classes (i.e. 1000+ 
characters) and a large number of features (i.e. 100+ 
features). 

A further object of this invention is the provision of a 
computer automated feature selection method in which the 
selection program is executed in a distributed operation on 
multiple processors. 

Briefly, this invention contemplates the provision of a 
computer automated feature selection method based upon 
the evaluation of hyper-rectangles and the ability of these 
rectangles to discriminate between classes. The boundaries 
of the hyper-rectangles are established upon a binary feature 
space where each bit indicates the relationship of a real 
feature value to a boundary within the minimum and maxi- 
mum values for the feature across all samples. Data reduc- 
tion combines the binary vector spaces so that the number of 
samples within a single class is within a range which is 
computationally feasible. Identification of subclasses iden- 
tifies maximal subsets of S + which are exclusive against S~. 
Feature evaluation determines within a single subclass the 
contribution of each feature towards the ability to discrimi- 
nate the subclass from S". The base algorithm examines each 
feature, dropping any feature which does not contribute 
towards discrimination, A pair of statistics are generated for 
each remaining feature. The statistics represent a measure of 
how many samples from the class are within the subclass 
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and a measure of how important each feature is to discrimi- 
nating the subclass from S". The values for each subclass are 
then combined to generate a set of values for the class. These 
class feature metrics are further merged into metrics evalu- 
ating the features contribution across the entire set of 5 
classes. Feature reduction determines which features con- 
tribute the least across the entire set of classes. These 
features will be removed from consideration. The algorithm 
drops features if they fail to reach a predetermined signifi- 
cance level. If all features are found to be significant then the 10 
feature with the lowest contribution metric is discarded. 
Finally, peaking determination is used to determine if the 
process should be repeated against the reduced feature 
space. The peaking determination is done by examining the 
rate of change within the significance metrics. 15 

The basic algorithm is set forth in two articles by M. Kudo 
and M. Shimbo: "Feature Selection Based on the Structural 
Indices of Categories," Pattern Recognition 26 (1993) 
891-901, and "Optimal Subclasses with Dichotomous Vari- 
ables for Feature Selection and Discrimination," IEEE 20 
Trans. Syst. Man Cybern. 19 (1989) 1194-1199, which are 
incorporated herein by reference. 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 

The foregoing and other objects, aspects and advantages 
will be better understood from the following detailed 
description of a preferred embodiment of the invention with 
reference to the drawings, in which: 

FIG.l is a diagram of a generalized procedure to provide 30 
custom generation of an image character recognition engine 
in accordance with the teachings of this invention. 

FIG. 2 is a diagram using object model notation to 
illustrate the feature vector subsets useful in the feature 
selection method in accordance with the invention. 35 

FIG. 3 is a block diagram of the distributed processing of 
feature vectors in the feature selection process of the inven- 
tion. 

FIG. 4 is a flow diagram of one embodiment of the 
method steps for feature selection in accordance with the 40 
invention. 

FIG. 5 is an example of determining the minimized 
distance measure in two-dimensional feature space in accor- 
dance with one aspect of the present invention. ^ 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT OF THE INVENTION 

Referring now to FIG. 1, with the feature selection 
algorithm of the invention and a suitable prior art ICR tool, 50 
it is now possible to develop and test ICR engines that are 
customized to a source data set such as indicated in the block 
10, Potential Source Material. A small portion of the source 
data is selected as a training set in block 12. Using a suitable 
prior art base ICR tool, this training set is properly seg- 55 
mented and a truth model established for each of the 
characters, blocks 14 and 16. 

The segmentation takes as input the document image and 
produces as output a list of character coordinates from the 
image. The literature contains many methods for doing 60 
segmentation, many of which would be suitable here. For 
example, the segmenter used in the preferred embodiment 
used image histograms to determine the lines and then 
clustered the connected components in a line, broken into 
strokes where needed, into the characters. 6 5 

The resulting characters are then automatically clustered 
into groups which is an attempt to pull out the different 
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characters. An expert shown in silhouette then corrects any 
segmentation errors and/or grouping errors and tags the 
characters with the desired codepoints, block 17. 

As will be explained in more detail subsequently, at block 
18, real feature vectors are generated for the feature universe 
under examination. The real feature vectors are converted to 
binary vectors as part of the feature selection algorithm. The 
feature selection algorithm then processes each class to 
determine the maximal exclusive subsets and the corre- 
sponding contribution metrics. The feature selection algo- 
rithm continues to reduce the feature universe until a peak- 
ing determination is made. Once the final feature set is 
established, an ICR template is generated that corresponds 
to the input training data. 

Once the template is prepared, the remaining source data 
can be processed by the ICR engine 20 as an online 
document, block 22. This consists of reading the template 
library, segmenting the input data, and performing the 
recognition based upon the minimized distance measures. 

The process may be repeated as often as necessary. 
Examination of alternative feature sets may be performed as 
new features are proposed by research efforts. The process 
would be repeated to generate new engines to support 
additional languages or data sources. 

Referring now to FIG. 2, it represents the data structure 
used in the automated feature selection process. The Figure 
uses Object Model Notation as described in the book by J. 
Rumbaugh et al. entitled Object Oriented Modeling and 
Design, 1991, Prentice Hall. The diamond shape symbol 
signifies "one" connected to the filled in circle which 
signifies "many." As seen in the figure, one set has many 
classes. The dotted lines indicate a conceptual relationship 
which is not necessarily required in the implementations. 
The fundamental unit of data is the binary feature "Vector" 
represented in block 28. Binary feature vectors are aggre- 
gated into a "Class" block 30, which represents a character, 
or one of several representations of a character (i.e. S + ). The 
classes are then aggregated into a "Set" (block 32) which 
contains the entire universe of training data. A "Mew" 
(block 34) is an aggregation of vectors within one class. And 
finally a "Collection" (block 36) is an aggregation of views, 
which all correspond to the same class. Each of these has a 
set of data attributes and functions used to operate on the 
data attributes. 

An implementation of the distributed, iterative feature 
evaluation tool is represented in FIG. 3. Within the select 
features activity, processing consists of three phases; data 
initialization, processing of classes, and result gathering. 
The processing architecture is designed to support the use of 
multiple processes against the same set of data as well as the 
ability to interrupt and restart processing. 

Data initialization consists of three activities; "Build 
binary vectors" (block 40), "Build S~" (block 42), and "Start 
children" (block 44). These activities are performed only by 
a parent process. The build binary vector step 40 consists of 
converting real feature vectors from a feature file 41 into 
binary feature vectors which are stored in a binary feature 
file 43. The resultant binary feature vectors contain feature 
values for the entire feature universe (S"), which are stored 
in a set file 45. A mapping is maintained to identify the 
current feature set. As each class is binarized it is placed into 
the set, S". Since S" contains every class, during processing 
it is necessary to skip the codepoint of the class under 
consideration (S*) when it is encountered within S". If the 
number of samples in a class is larger than a user specified 
parameter, the class is reduced to a desired number of 
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samples. The reduction for S~ is based on Hamming distance 
and logical disjunction. The parent then initiates child pro- 
cesses (block 44) for processing individual classes. The use 
of children and the number to use can be selected by the user. 

In decision block 46, the parent and children processes 
select classes for processing until each class has been 
processed using the current feature set (S"). After selecting 
the class to be processed as (block 48 with clusters stored 
in cluster file 47), the process determines if reduction in 
sample size is required. Reduction is recommended if more 
than 32 samples exist in a class (i.e. 32 samples allows a 
view to be manipulated using integer operations). The exact 
limit can be specified by the user. Once the class has been 
prepared for use as S + , maximal exclusive subsets are 
identified using a non-recursive version of the base algo- 
rithm described in the Kudo et al. articles in place of the 
recursive version described in the articles, block 50. For 
each of these maximal exclusive subsets, it is evaluated 
against the current feature set based upon the subset's ability 
to discriminate against S~, block 52. The binary vector 



10 



To reduce the total number of iterations required, once a 
subclass divides, each resulting portion needs to be exam- 
ined to determine if it is sufficient or if further subdivision 
is required. The same subdivision may be identified many 
times upon examination of the divisions. This duplication is 
redundant and in a large problem space requires significant 
processing time. To address this issue, an iterative algorithm 
is used in accordance with the teaching of this invention. A 
work list 72 is maintained that corresponds to the original 
recursive call. Redundant entries are not placed onto the 
worklist. This reduces significantly the number of traversals 
of S" required to determine if subsets were exclusive. 

The loop within the flow diagram (FIG. 3) from the 
activity "Build S~" through the decision "Desired Reduction 
Achieved" reduces the feature universe, block 74. The base 
algorithm drops features from the universe using two rules. 
First, features are removed if the feature contribution does 
not exceed a configurable threshold, block 76. Then, if no 
features were dropped in the first step a single feature is 
selected for removal. At block 79, a peaking determination 



which represents the view has all redundant features cleared. 20 is used to determine if the process should be used against the 



The view, the resulting vector from the view, and the feature 
evaluations are stored to memory. 

Once each class has been processed against S~, the parent 
process combines the metrics for each class so that a single 
metric is available that describes the contribution of each 
feature in the current feature set, block 52. Those features 
which do not contribute significantly (as configured by the 
user) are discarded, block 54. If all the features are 
significant, those that contribute the least will be removed. 
The user can configure a desired rate of feature removal. The 30 
structure indices are then tested to see if the feature evalu- 
ation process should terminate, block 56. Once the final 
feature removal has occurred, the last set from result files 57 
and 59 combine in generating the template library, block 58. 

Referring now to FIG. 4, it shows how, in accordance with 
the teachings of this invention, the automated process for 
feature selection is incorporated into an automated method 
of creating an OCR engine for a given language and/or 
source. An optional disk cache 61 is used as a temporary 
storage in those hardware implementations where adequate 
common memory is not available. It starts with document 40 
samples (block 60) and ends with an OCR engine 62 
optimized for the sample source. As previously explained, 
the sample data features are inputted, block 64, and 
binarized, block 66. Binarization converts real feature vec- 
tors to binary feature vectors. To efficiently use the bit space 45 
allotted to the binary vector, thresholds are defined; e.g. 
(P+2), P is defined in the article by M. Kudo and M. Shimbo 
entitled "Feature Selection Based on the Structural Indices 
of Categories/' Pattern Recognition 26 (1993), page 893 



reduced feature space. This peaking determination can be 
accomplished by examining the rate of change within the 
significant metrics. 

The results from the feature selection mechanism are used 
25 to generate a template library, block 80. As described in 
"Feature Selection Based on the Structural Indices of 
Categories", page 896: 
c=a Class 
Sc^sS 4 " for class c 

ISc^number of sample vectors in Sc* 
Sc"=S~ for class c 

|Sc"|=number of sample vectors in Sc" 
G=a view in Q(Sc% Sc") 
|G|=number of sample vectors in G 
a(G,I)=a(G) with the ith feature zeroed out (conceptually 
this removes both sides of the hyper-rectangle for that 
feature's dimension) 
C"(a(G,i))=the subset of vectors in a(G,i) exclusive 
against Sc" 

|C"(a(G,i))|-number of sample vectors in C~(a(G,i)) 



35 



^(CcHel/M 



(4) 



Conceptually eq. 4 is the degree of contribution of G to 
Sc* 



p-(G,i,cHC-(a(G,0^-| 



(5) 



Conceptually eq. 5 is the degree to which feature I is 



column 2, and the real sample vector is transformed based 50 important i n order to make G be exclusive against Sc- 



on its position relative to the thresholds 

A data reduction step 68 includes the Identify Subclass 
step of FIG. 3. The activity "Identify Subclass," block 70, 
requires the greatest amount of CPU time. The base algo- 
rithm for finding subclasses described in the M. Kudo and 
M. Shimbo article entitled "Optimal Subclasses with 
Dichotomous Variables for Feature Selection and 
Discrimination," IEEE Trans. Syst. Man Cybern., 19 (1989) 
pp. 1194-1199, performs at least one iteration through the 
recursive procedure ENUMSAT. The class S* may not 
contain any subclasses. Predetermination of the existence of 
subclasses within S* can completely remove the "Identify 
Subclass" activity. The existence of subclasses is easy to 
detect. If a vector generated from the view consisting of each 
sample in S*" is exclusive against S", then each sample is 
exclusive against S". In this case a single view consisting of 65 
each sample will be the only entry within the collection 
Q(S + ,S"). 



Now the contribution metric for a feature I for class c is 
the summation across all G in Q(Sc*, Sc") of [p*(G,c)*p~ 
(G,i,c)]. 

The contribution metric for a feature across the entire Set 
55 (all classes) is the summation of all contribution metrics for 
a feature across all classes. 

Note however that features are left zeroed out if the 
contribution metric for the feature is zero (i.e. |C~(a(G,i))| is 
zero). This implies that feature confidence metrics are 
dependent on previous iterations. Using one pass through 
the features for one G would tend to favor the features 
looked at last. In the present invention to avoid favoring the 
features looked at last, the following steps are followed: 

1) Calculate the contribution metrics as before. 

2) Sort the features from the largest contribution to least. 

3) Reset all of the features to their original values. 

4) Pass through the features as before using the new order. 
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5) Repeat, starting at step 2, unless one of the following This set of data is used to recognize each character within 

conditions are met: the source materials. 

a) A maximum number of iterations allowed is The recognition of a source character is a three step 
exceeded (on the order of 5). process. The character image is converted into a binary 

b) No new features are added to the set of good s feature vector using the thresholds. This binary vector is 
features. then compared against each of the template vectors gener- 

c) The contribution metrics remain substantially ating a distance measure. Finally, the template which mini- 
unchanged as the evaluation order changes. mizes the distance measure is selected as the correct class for 

Contribution metrics will change when evaluated in a the source character, 

different order. When no significant change occurs as indi- The distance measured between a source binary vector (S) 

cated in condition c above it is an indication that the process and template vector (1) each composed of N features is 

can stop. The template library 80 is then used by the given by me ^ equations 6 and 7 below, 
recognition engine to process source samples. 

The parallel implementation is not graphically repre- -*> ( -? v f\ A ^ ,^ 

sented here. Basically, the steps between and including data rt ' " ' w 

reduction and feature evaluation can be done independently 15 

for each S + and the resulting template libraries merged in - - A ,_. W 

step 82. d = *xc=X;(Q*2) 

Given the feature selection algorithm and abase ICR toot, 1 
it is now possible to develop and test ICR engines that are 

customized to the source data set. A small portion of the Eacn bit m a bin vector represerjts a mh denoting 

source data is selected as the training se . Using the base ICR whether ^ yalue {qi ^ examin J feature is greater or le * 

tool this training set is properly segmented and a truth model ^ threfihold Qr bucket ^ Eacfa fe ^ 

established tor each or the characters. Real feature vectors & v y r 

are generated for the feature universe under examination. feature distance measure ("if ,) in equation 6 that represents 

The real feature vectors are converted to binary vectors how far the sample is from the edge of a template's valid 

within the feature selection algorithm. The feature selection 2 s range which is the interior of the hyper-rectangle. The final 

algorithm then processes each class to determine the maxi- distance is the feature distance multiplied by the confidence 

mal exclusive subsets and the corresponding contribution value for mat fcaturc final distance function (D) 

metrics. The feature selection algonthm continues to reduce may be used ^ of me three different confidence 

the feature universe until a peaking .determination is made. vectors; subse( d afld get 

Once the final feature set is established, the ICR template is Ati mm „i^ • Hl ' j- firit .- p„ af , 1TO e „™ ; c ^ rt „ m • 

na ~^ nta A *u„< ^ : A n *Z ™ example in two-dimensional feature space is shown in 

generated that corresponds to the input training data. JU _ ™ r . , , n ftx f , . . 

Once the template is prepared, the remaining source data 5 ; ™ e t ^ nc& sa ™Pf e 

can be processed by the ICR engine. Ms consists of reading recognized) to hyper-rectangle Bl is dl*cl^l) + d2*c2(Bl) 

the template library, segmenting the input data, and per- where cl(Bl) is me wntnbuUon factor of Feature 1 for Bl 

forming the recognition based upon the minimized distance ™ d hkewise for c2(Bl). The same distance is calculated 

measures. c between the sample P and all hyper-rectangles. A K-nearest 

The process may be repeated as often as necessary. 3 neighbor algorithm is used to decide which hyper-rectangle 
Examination of alternative feature sets may be performed as wins. For K-l, this is simply the hyper-rectangle with the 
new features are proposed by research efforts. The process smallest distance. In this example, with K=l, the hyper- 
would be repeated to generate new engines to support rectangle A3 might win with a distance of d3*c2(A2+0*cl 
additional languages or data sources. (A2) and hence the sample P would be classified as an *A\ 

The results from the feature selection mechanism are used 40 While the invention has been described in terms of a 

to generate a template library as set forth in the I. Bella and single preferred embodiment, those skilled in the art will 

G. Macey paper "Feature Selection Given Large Numbers of recognize that the invention can be practiced with modifi- 

Classes and a Large Feature Universe" Proceeding 1995 cation within the spirit and scope of the appended claims. 

Symposium on Document Image Understanding Having thus described our invention, what we claim as 

Technology, October 24-25, pp 202-212. The paper is 45 new and desire to secure by Letters Patent is as follows: 

hereby incorporated by reference. This template library is l.Acomputer automated method for machine recognition 

then used by the recognition engine to process source of character images in source material including the steps of: 

samples. The data contained within the results and the selecting a sample portion of said source material as a 

comparison process is provided for completeness. training set* 

The feature extraction algorithm produces the following <n . L ' 

^ ata . 01- 3u segmenting said trammg set; 

Thresholds (Bucket Ranges): The information required to grouping characters segmented in said segmenting step 

convert from a real feature value to the binary repre- mto classes; 

sentation. generating feature vectors for each of said classes; 

Feature Map: Identification of the features that comprise 55 generating a feature set by processing each class to 

the final selected feature set. determine a maximal exclusive subset and a corre- 

Class Data: there are multiple data elements generated for sponding metric iteratively until a peaking determina- 

each class. The complete set is: the codepoint of the tion is made, wherein said processing includes prede- 

class, the binary vectors and the related confidence termining the existence of classes of said feature set by 

values representing the maximal exclusive subsets 60 testing a vector generated from a view consisting of 

(Q(S 4 -,S - )) within the class, and a final set of confi- each sample in said feature set to determine if the 

dence values for the class as a whole. The class wide vector is exclusive against said feature sets for all 

confidence values are aggregations of the confidence characters of said training set; 

values for each subset. generating an image character recognition template cor- 

Final Confidence Values: A contribution metric for each 65 responding to said training set; 

feature in the result set. This metric is the aggregation processing on line said source material with character 

of the class contribution metrics. recognition template. 
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2. A computer automated method for machine recognition 
of character images in source material as in claim 1 includ- 
ing the further step of eliminating feature vectors that 
contribute less than a predetermined level of exclusivity of 
said feature set. 

3. A computer automated method for machine recognition 
of character images in source material as in claim 1 wherein 
said generating step includes the further steps of maintaining 
a work list of subclasses and entering into said work list only 
subclasses not previously entered. 

4. A computer automated method for creating an image 
recognition engine for a universe of characters comprising 
the steps of: 

selecting samples from a universe of characters as a 
training set; 

segmenting samples from the universe of characters; 
determining a feature set for identifying each character in 
the samples; 

evaluating features in said feature set to determine maxi- 
mal subsets of said feature set that are exclusive of 
feature sets for all characters of said universe of 
characters, wherein said evaluating step includes a step 
of predetermining the existence of classes of said 
feature set by testing a vector generated from a view 
consisting of each sample in said feature set to deter- 
mine if the vector is exclusive against said feature sets 
for all characters of said universe of characters; 

determining the contribution metrics for each feature in 
each class for each set, wherein the classes are evalu- 
ated iteratively until a peaking determination is made; 

eliminating features that contribute less than a predeter- 
mined level of exclusivity of said feature set; and 

developing a template library of feature sets for use in an 
optical character recognition engine. 

5. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 4 
further comprising the steps of maintaining a work list of 
subclasses and entering into said work list only subclasses 
not previously entered. 

6. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 4 
including the further step of converting real feature vectors 
to binary feature vectors by defining a series of threshold 
values with a binary value assigned to each threshold value, 
and assigning the binary value to each real feature value 
based on its threshold value. 

7. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 4 
wherein said steps of evaluating and eliminating are 
repeated until a desired reduction in the feature universe is 
achieved and wherein a percentage of features are elimi- 
nated each time said steps of evaluating and eliminating are 
repeated. 

8. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 4 
including the further step of converting real feature vectors 
to binary feature vectors by defining a series of threshold 
values with a binary value assigned to each threshold value, 
and assigning the binary value to each real feature value 
based on its threshold value. 

9. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 5 
including the further step of converting real feature vectors 
to binary feature vectors by defining a series of threshold 
values with a binary value assigned to each threshold value, 
and assigning the binary value to each real feature value 
based on its threshold value. 

10. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 4 
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wherein said steps of evaluating and eliminating are 
repeated until a desired reduction in the feature universe is 
achieved. 

11. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 5 
wherein said steps of evaluating and eliminating are 
repeated until a desired reduction in the feature universe is 
achieved. 

12. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 6 
wherein said steps of evaluating and eliminating are 
repeated until a desired reduction in the feature universe is 
achieved. 

13. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 10 
wherein a certain percentage of features are eliminated each 
time said steps of evaluating and eliminating are repeated. 

14. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 11 
wherein a certain percentage of features are eliminated each 
time said steps of evaluating and eliminating are repeated. 

15. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 12 
wherein a certain percentage of features are eliminated each 
time said steps of evaluating and eliminating are repeated. 

16. A computer automated method for creating an image 
recognition engine for a universe of characters as in claim 4 
wherein said step of evaluating features is carried out in 
parallel in a plurality of processors for a plurality of feature 
sets for identifying each character. 

17. A computer automated method for creating an image 
recognition engine for a universe of characters by selecting, 
from a large universe of features, subsets of features to 
optimize recognition accuracy, comprising the steps of: 

segmenting samples from the universe of characters; 

extracting feature sets for each character in the sample; 

determining binary vectors for each character from the 
extracted feature sets; 

evaluating features in said feature set to determine maxi- 
mal subsets of said feature set that are exclusive of 
feature sets for all characters of said universe of 
characters, wherein said evaluating step includes a step 
of predetermining the existence of classes of said 
feature set by testing a vector generated from a view 
consisting of each sample in said feature set to deter- 
mine if the vector is exclusive against said feature sets 
for all characters of said universe of characters; 

determining the contribution metrics for each feature in 
each class for each set, wherein the classes are evalu- 
ated iteratively until a peaking determination is made; 

eh'minating features that contribute less than a predeter- 
mined level to exclusivity of said feature set; and 

developing a template library of feature sets for use in an 
optical character recognition engine. 

18. The computer automated method of claim 4, further 
comprising the steps of: 

determining binary vectors from the template library; 

minimizing the distance measure between templates and 
the binary vectors by finding the distance between the 
binary vector, which is a point, and the closest template 
edge for each feature; and 

providing each character associated with the closest tem- 
plate or templates as performed by the K-Nearest 
Neighbor test. 
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BACKGROUND 

Hie invention pertains to methods for programming 
computers and the software tools for implementing 
these methods. 

Early computer programming methods were limited 10 
to procedure based languages in which the actions of 
the computer were controlled by a long sequence from 
beginning to end. During execution of the sequence, 
control might be passed from one program to another 
which was specified by the first program and then con- 15 
trol might be returned to the original program or might 
continue on to a third. In any event, control followed a 
sequence established by the programmer. 

In order to model the reasoning processes of human 
experts, expert computer systems were devised which 20 
included long lists of facts and long lists of rules stating 
inferences that might be drawn if and when certain facts 
exist. When a rule is satisfied, consequences will typi- 
cally cause the specification or updating of some of the 
facts. These changed facts will then cause other rules to 25 
become satisfied and the choice must again be made of 
which rule to apply next In such rule-based systems, 
there are typically many different rules which are satis- 
fied by any given condition of the set of facts and the 
application of one rule ahead of another will frequently 30 
cause a difference in the result or the speed at which it 
is achieved. 

For this reason, and also because the numbers of rules 
are so large, applying every rule which is satisfied in 
each iteration of the system would be prohibitively 35 
slow, considerable research and experimentation has 
been devoted to developing "control strategies" and 
"inferencing methods" for determining which of the 
rules should be applied when. For a given set of facts 
and a given set of rules, different control strategies will 40 
produce different results or will reach the results with 
different efficiencies. 

The creation of rule-based expert systems is a com- 
plex task, typically requiring a "knowledge engineer'* to 
work with an expert to translate the expertise into long 45 
lists of potentially relevant facts and long lists of rules. 
Once the system is built, it is difficult to verify that the 
system will produce correct results in each situation and 
it is difficult to analyze the exact steps taken by the 
system to achieve each result 50 

Another architecture for modeling human expertise 
in computers uses **frames" or "objects" to represent 
items or classes of items in the real world, and then 
allows "children" of those frames or objects to be de- 
fined which inherit characteristics of their "parents" but 55 
are further differentiated from their "siblings" with 
additional information. Such a system is quite effective 
for representing taxonomic knowledge such as the 
method of organizing animal life into kingdoms, phyla, 
classes, orders, families, genera, and species. 60 

SUMMARY OF THE INVENTION 

The invention is a novel method of programming a 
computer system and the various software tools and 
components which implement this method. Although 65 
the method was developed to meet the need for im- 
proved expert systems, it has been discovered that the 
method is also useful for programming many other 
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types of computer systems and is not limited to use for 
expert systems. 

A fundamental concept of the invention is that of the 
"object**: a necessary fact, a calculated result, a conclu- 
sion, or an identifiable element of the expert analysis 
that is being modeled Experts do not use the term "ob- 
jects", but they work with such objects all the time. 
When human experts use their expertise to solve a prob- 
lem in the domain being modeled, they think by manip- 
ulating such objects. They ask questions to acquire 
necessary facts for their analysis; they calculate results; 
they come to intermediate conclusions, which affect 
their later reasoning; and they calculate or conclude 
final answers or recommendations. What makes them 
experts at solving such problems is that they have a 
long-standing familiarity with these elements of their 
analysis and they know the relationships among them. 
They know what questions to ask under what circum- 
stances, and what consequences flow from which facts. 

Although this description of the invention employs 
the word "objects", the meaning of this word has little 
in common with the "objects" of prior art expert sys- 
tems. 

The invention is designed to eliminate the "knowl- 
edge engineer", who serves in conventional expert sys- 
tems as an intermediary between the domain expert and 
the system being built. The expert is typically not a 
computer programmer, and cannot be expected to 
know how to model his expertise in software, using 
conventional languages or expert system shells. The 
knowledge engineer, in the building of a typical expert 
system, forms this crucial link between the expertise to 
be modeled and the computer programming that is 
usually required in order to model it. The knowledge 
engineer interviews the expert to elicit the expertise and 
knowledge, and then applies his or her own computer 
programming expertise to represent that knowledge in 
the software. Unfortunately, there is enormous poten- 
tial for error in this process, and it can be frustrating and 
expensive for all concerned. 

The invention allows an expert to model expertise 
directly in the software. The object paradigm is intu- 
itive enough that most experts take to it relatively eas- 
ily, and can design their own systems first-hand. To 
create a knowledge base, one simply creates objects and 
then specifies the relationships among them in order to 
perform the analysis. The analysis happens in a specific 
sequence specified by the expert This exactly mirrors 
one of the ways experts solve problems in the real 
world: they start by acquiring relevant facts, and then 
"reason forward" in some way from those facts to ar- 
rive at the consequences that flow from them. Each 
fact, consequence and all other factors which are useful 
in the course of the analysis become objects in the 
knowledge base. 

During execution of a consultation, as facts are ac- 
quired from the user, some objects will turn out to be 
important, while others will be found to be irrelevant to 
the analysis and therefore will be ignored. In this way 
the system responds flexibly to different inputs and 
thereby effectively replicates in software the expertise 
of the designing expert 

Distinguishing features of the knowledge representa- 
tion method are that: 

(1) Action in the system takes the form of evaluating 
and firing objects, rather than firing rules as in prior art 
systems. Objects and their appropriate behavior, rather 
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than rules and search strategies for applying them, be* particularly those objects that do not fire and are there- 
come the central focus of system design. fore bypassed. 

(2) When an object fires, among other possible ac- The invention achieves numerous advantages over 
tions, it can set a value for itself. prior art expert system programming methods. Avoid- 

(3) Knowledge is represented by objects and a de- 5 ing the problem of search strategies allows the system to 
scription of what factors can affect the relevance and run faster. The use of objects which have behavior 
appropriate actions of objects. For most of the objects, criteria and which can be given a value for reference by 
the relevance criteria include references to values set other objects allows expert systems to be more easily 
for themselves by previously fired objects. programmed and debugged. The sequence the program 

(4) The description of all such factors that could 1° will follow and the results that it will achieve are more 
affect an object's relevance and appropriate actions are predictable. The invention allows a consultation session 
collected in only one place in the form of a set of evalua- to be easily restarted at any point. As discussed in the 
table statements associated with the object, the object's detailed description below, the invention allows valida- 
behavior criteria. tion tables to be constructed which allow for easy proof 
A distinguishing label for this method is "relevance- 15 that the system has been correctly constructed, 
actions-value based prograinming". Although the programming method was invented to 

The knowledge being represented in the system thus meet the need for improved methods for building expert 

takes the form of an enumeration of the instructions systems, it can be used for programming many other 

necessary to make a binary decision: whether to take types of computer applications as well. It is particularly 

action with respect to an object. The significance of that 20 well suited to problems which involve multiple decision 

decision for the analysis being modeled lies in the inner- branches with interdependencies and where ease of 

ent meaning of the object itself, as intended by the sys- building, debugging, and validating the system are rela- 

tem designer, and in the inter-object dependency rela- tivel y important compared to the processing speed of 

tionships that are created by these relevance criteria ^ toe resulting system. 

instructions. ... , ■ BRIEF DESCRIPTION OF THE DRAWINGS 

There are important distinctions between the rele- . 
vance criteria of the present invention and rules of the mG - 1 sh° ws a sequence of objects containing the 
prior art Prior art rules are organized in a long list, es ^ al dements of ? b J«*s m ^e invention, 
sometimes with groupings for ease of management, . n mG - 2 ^ows a ample form of object processing 
separately from the database of "facts" or "objects". program which may be used with a data set comprised 
The central question is, given the set of facts, which of 0 *^i ec ? according to this invention, 
many rules that could be applied should be applied? ? ? hows the vwious computer programs, along 
Different search strategies for choosing the rule to be with their inputs and outputs, winch make up this inven- 
applied can produce different results. New rules can be « ^J** , , , . . t 
added at anytime without chaiiging existing rules. FIG. 4 shows the vanous computer files which the 
In contrast, in the present invention, there is no processing program of the invention uses toper- 
search for applicable rules. Each object is considered in form a consultation. 

its turn and its relevance criteria are evaluated. Al- mG * shows the file structure of the knowledge 

though the relevance criteria for a particular object can 4A , „ . _ _ , 

be modified, rules cannot be added to the system. «i stnicture of ^ generated 

To translate a rule-based system of the prior art into ' „ , , „ , , 

the relevance-actions-value based system of the present ™- \ s * ows * e *J* structore of control 

invention, a description of relevant objects must be £KJ. 8 shows the data structure of the queue arrays, 

developed, which will not match one for one with the 45 fil 9 shows * e ofthe <***t values 

set of facts or objects in the rule based system, and then A , . 

each of the rules must be examined to determine FIG. 10 shows an overview flow chart of the object 

whether a portion of the rule should be reflected with Processing program m the preferred embodiment, 

an appropriate expression in the behavior criteria for ™ GS - lla ™ d 1V > a detailed flow chart of the 

one or more of the objects. With such a translation, 50 object P rocessm S P 10 ^ m * e Preferred embodi- 

some possible results of the original system would not m ?Bh, , 

be achievable. FIG. 12 shows a sample validation table as presented 

Prior art systems devote processing to navigating a ^ vahdation table generator component of the 

decision tree and reducing the system search space in invention. 

order to make the system behave more the way we 55 DETAILED DRSCRIPTION OF PREFERRED 
perceive human experts to work, and to achieve greater EMBODIMENT 
processing efficiency at run time. But the processing . . 
necessary to search rule conditions, navigate the tree, Definition of terms 
and reduce the search space, is work that the system Application— An application is a set of one or more 
must perform, and therefore consumes time and system 60 related expert-system modules that collectively form a 
resources. In this invention, a tradeoff is made by com- conceptual whole and are designed to work together to 
mitting to examine all objects in the knowledge base in represent the entire problem domain modeled by the 
return for avoiding the navigational processing of con- expert system. A given application will have as its cen- 
ventional systems. Yet the same rich network of logic, terpiece a database file (the Object Data File 40) con- 
knowledge, heuristics, and analytical dependencies are 65 taming information about all objects used in the apptica- 
represented in the behavior criteria which determine tion, and information about each of the modules as well, 
the firing of objects. Performance considerations then Module— A module is a subset of the defined objects 
focus on ensuring that objects are processed efficiently, of the application that forms a stand-alone expert sys- 
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tern in its own right Each module handles one concep- the Generated Data File from the data in the knowledge 
tually distinct aspect or area of the expertise needed to base with the Generator Program 8. The Generated 
be proficient in the overall application domain. In the Data File is distributed to users together with the Ob- 
preferred embodiment, modules are able to call other ject Processing Program 9, which uses it to run the 
modules as subroutines in the course of their own pro- 5 analysis. The knowledge base itself remains with the 
cessing. developer and is not distributed. Modifications to the 

Knowledge base 10 — A knowledge base is the collec- system are accomplished by altering the knowledge 
tion of data files in the invention that contains all of the base, generating an updated Generated Data File, and 
data about the application. These files consist of the distributing that new file to users. 
Object Data File 40, the Statement Data File 42, and the 10 

Expression Data File 44, together with their associated Composition of an Application 

index files. The centerpiece of an application is the collection of 

Object — Objects are the basic building blocks of a files known as the knowledge base 10. The Generated 
knowledge base. Each object represents a fact, ques- Data File 14 is generated from these files, and the Ob- 
tion, conclusion, calculation, recommendation, or other 15 ject Processing Program 9 uses the Generated Data File 
material which can, under appropriate circumstances, in order to run the resulting system, 
be asserted as applicable in the reasoning process being In the preferred embodiment, the invention is imple- 
modelled. If a piece of data needs to be captured during mented in FoxPro, a microcomputer relational database 
the course of a consultation, it will be represented by an management system. FoxPro is a variant of the "X- 
object 20 base" family of languages which originated with dbase. 

Attribute— Properties of objects are called attributes. The fact that X-base is the de facto industry standard 
According to the invention, the attributes of every database language for micros means that the invention 
object must include "relevance criteria" 1 and a specifi- allows significant connectivity with other systems. This 
cation of actions 2 to be taken if, upon evaluation, the is enhanced by the invention's ability to call custom 
relevance criteria are satisfied, one of which may be the 25 subroutines from within a consultation, to exchange 
setting of a value 3 for the object. In the preferred em- data with standard external files, and to be called as a 
bodiment, every object also has a descriptive name, module from external programs, such as menu systems, 
which is developer-supplied, and a unique identifying A consultation can be initiated from other programs 
number (the "identifier")! which is system-supplied, as by passing parameters which indicate the application 
well as an object type designation and an attribute indi- 30 number and module number for the consultation. For 
eating that the object is a member of at least one mod- example, an overall menuing program might have a 
ule. Attributes for a particular object may vary from menu choice for each module in an application, other 
module to module, among the modules of which it is a choices for modules of a different application, and still 
member. For example, in one module an object may be other choices for subroutines that are not consultations, 
an explicit question to be asked of the user, while in 35 Several other types of files are usually needed in 
another module of the same application the same object order to constitute the overall application in its final 
might be an internal conclusion which concludes a form. Typically there are other data files that are cre- 
value 3 based on the values 3 of other objects in the ated by the developer to hold data for a historical re- 
module, cord of prior analyses. These can be crucial to the cor- 

Modules also have attributes, as does the application 40 rect performance of the system, since the invention 
as a whole. These application-level attributes are stored allows values to be imported from such external files 
in the Object Data File in application records, which and assigned to objects in the course of processing a 
have an object type of "A". consultation. The invention also can take a consultation 

Object Value 3 — Each object can be given a value 15 (partial or complete) in its entirety, including an- 
which can then be referenced and used by other objects. 45 swers supplied by the user, and save it in compressed 
Object values are stored in a file (see FIG. 9) created form to a memo field in a particular record in such an 
during a consultation. external file. This allows consultations to be replayed at 

Consultation 15 — The process in which a user inter- a later date, 
acts with the developed expert system application is In addition to external data tables, a finished applica- 
called a consultation. During a consultation, the system 50 tion will usually include a file of custom subroutines 
processes the objects entered into the knowledge base. which may be called by objects during the course of 
Various objects ask for information from the user where processing a consultation. The invention can open a 
necessary, and other objects draw conclusions based on channel to such a file in order to allow rapid access to 
the entered information. After all necessary data has such subroutines by opening it as a FoxPro procedure 
been collected, the consultation typically continues by 55 file at the start of the consultation 15. 
processing more objects and asserting conclusions or 

recommendations that are found to be applicable, based Contents of the Knowledge Base 

on the data collected. The knowledge base 10 consists of the objects within 

Developer & User — The developer is the person (or an application and their attribute data. The knowledge 
group of persons) who creates the knowledge base in 60 base holds all information about the objects, their char- 
order to model the desired expertise. Usually this is the acteristics, their interrelationships, the external files 
domain expert, the person who has the expertise being they interact with, and the customized subroutines that 
modeled. The resulting expert system application is they may execute. 

distributed to multiple non-expert users, who benefit The knowledge base is contained primarily in the 
from the expertise by using the system. 65 Object Data File 40. Each record in this file holds an 

Generated Data File 14 — Having developed the object Each object must have a name, one that should 
knowledge base by creating all necessary objects and be intuitive to the developer. It will also have a system- 
specifying all of their attributes, the developer generates generated, unique identification number (the "identi- 
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fier") used by the invention. It will also have various bute data string, attribute delimiters separate one attri- 
attributes that distinguish it from other objects. One bute's data from the next 
fundamentally important attribute of each object is its 

"relevance criteria" 1: the circumstances under which System Organization 

the object is considered to be relevant to the analysis. 5 The developer of an expert system application using 
(An object that is determined to be relevant is said to this invention creates objects and organizes them into a 
"fire".) Other attributes include actions that an object desired sequence. The sequence usually flows from the 
may take, and action criteria which describe the condi- approach used by the expert when solving a problem, 
tions under which the actions should be executed. and the resulting sequence will reflect the progression 

Other files support the Object Data File 40 to form 10 of the expert's thinking when analyzing the problem, 
collectively the entire knowledge base. These are: This close mapping between the flow of logic in the 

the Statement Data File 42, which holds unique state- system and the natural approach taken by the expert is 
ments used in relevance and action criteria, and of great practical benefit in system design and testing, 
the Expression Data File 44, which contains all making it possible for the expert to model his or her 
unique values that objects may acquire, as specified 15 own expertise directly in the system, 
in the Object Data File, and unique left-side and Object sequence is also important from the user's 
right-side expressions used in criteria statements. point of view. Questions asked of the user, and messages 
As shown in FIG. 5, the Object Data File 40 has the & om the system to the user, will vary from consultation 
following structure: 7n to consultation, as different objects are determined to be 

20 relevant under different fact patterns. It is beneficial to 
present the questions and messages to users in as consis- 
tent an order as possible, so that over time the user will 
find the behavior of the system familiar and internally 
consistent. This can be a factor in the confidence users 
place in the system, and in its ultimate acceptance. 

When the Generated Data File 14 is created from the 
knowledge base 10 by the Generator Program 8, the 
invention will resequence objects if necessary in order 
Hie ID field stores unique numeric object identifiers. 30 to dependency relationships are re- 

The Statement Data File and the Expression Data File 8p f ted > but such rcs ^ ue ? cm ? 15 ^1° a ™ium in 
contain similar identifiers. New identifiers are assigned OTder t0 P reserve * e developer-defined object se- 
as needed by finding the largest identifier in current use t0 extent P 05 ^ 
and incrementing its value by one. To facilitate the use Knowledge Base Delimiters 
of such identifiers as pointers, all identifiers are stored as 35 . , 
r t, 9rfl ^ w ^^™o^ ~r Air^+« * ~ Several delimiter conventions are used in the inven- 
^^.^^Jj^J^t don to separate annate data within an object and to 
v . . , ,. r . - identity module information. 

character strings, yielding a system capacity of up to „ : , A r : " ^ c , . r . . 

10,000 identifiers in each file (<W thVough "9999"), Module de^ter-^this identtfies data for j partiauar 

. ' - ^ . * u j 40 module. It takes the form of a tilde (=;*)» followed by 

f y f^^J* ^ W a twodigit module number. For example, data for mod- 

™* N ^ fid 1 d J t ° r 1 f each ^ jecfs "P* Evef y ule 7 would appear in this way «07<data>. 
object m *e pledge base must have a unique name, Attrib ute dEer-this marks the begmning of an 
aligned by the developer upon its creation. A name attribute data string It consists of ASCII chancier 4 (» 
should be descnptive of the meaning of the object, or »), fol i owed by a numc ric ^ identifying the 

1*^ " ^y^^S m <*<*ed, but is attribute. For example, the text ofan on screen object is 
limited to FoxPro s maximum length of 10 letters for number 5 . lVs attribute delimiter is: 

memory variables, so that such variables may be created 05 < text data> 

using these names if desired. An implementation of the Internal data delimiters— The invention uses several 
invention using a language other than FoxPro would ^ characters as delimiters to separate data elements within 
likely have a different constraint on the allowed length attribute data strings. These include the vertical bar 
of a memory variable name. (« | ») j backslash ("V*). 

The ATX field stores temporary attribute codes ex- Criteria delimiter— where criteria are coupled with a 
tracted from the ATT field. These codes are used for particular data element (for example, when one of 
indexing, to facilitate the handling of objects and the 55 several possible values for an internal conclusion has 
editing of their attributes. When a module is opened so associated action criteria), ASCII character 254 (**■") 
that objects in the module may be edited, certain attri- separates the data element from the criteria string, 
bute codes are extracted from the ATT data for the 

module and stored in the ATX field, updating the index Objects and Their Attributes 

files. For example, one of the attributes stored in ATX go An object represents any individual element of the 

is the object type code: this allows all objects of a given analysis being modeled in the expert system. 

type to be grouped together and viewed as a single An object could be, but is not i™ti>H to, any of the 

collection of objects. following: 

The ATT memo field stores all attribute data (other (1) any useful concept, idea, abstraction, or judge- 
than ID and NAME), in the form of character strings. 65 ment, 

Where an object is a member of more than one module, (2) any fact that the human expert in the problem 
module delimiters (described below) separate one mod- domain might need to know in order to solve the prob- 
ule's data from the next module's data. Within an attri- lem, 
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(3) a representation of a real-world entity, such as a (11) ACTION CRITERIA: A set of statements to be 
person, a machine, or an organization, examined after an object has fired in order to determine 

(4) any question to be asked and answered by the if possible actions are to be performed. If, upon evalua- 
user, tion, the action criteria are satisfied, then the action will 

(5) any conclusion to be drawn if specified facts are 5 be performed. Not all actions have action criteria; 
present, including intermediate conclusions and the where they do not, such actions are always performed, 
results of calculations, or 

(6) any answer, recommendation, message, or other Types of Objects 

result that might be produced as useful output in the There are no hierarchies or other classifications of 
course of the analysis. 10 objects for purposes of knowledge representation. In 

Each object has a collection of assigned attributes particular, realworld knowledge is not explicitly repre- 
which, in a typical system might include: sented through object classes, as is the case in prior art 

(1) ID: A unique identifying number. "frame-based" or "object-based" expert system devel- 

(2) NAME: A unique name, which is an intuitive opment tools. The objects in the knowledge base are 
descriptor defined by the developer to connote the IS separate, atomic units on a peer level with one another, 
object's meaning. and do not inherit attributes from one another. 

(3) TRANSLATION: A meaningful, short, but usu- Every object in the knowledge base will have as an 
ally multi-word description of the object, which is dis- attribute at least one assigned object type designation, 
played during execution in various contexts as a re- The possible types of objects are: 

placement for the more cryptic object name. 20 system object: go-to 

(4) OBJECT TYPE: An object's type is a mandatory system object: reset 
attribute of every object which determines the standard system object: inter-module call 

actions that are performed in order to process it appro- demon object (calls a subroutine if it fires) 
priately when it fires. The Object Processing Program input object: screen — user input 
will recognize the object type attribute for a firing ob- 25 input object: screen — select one from predefined 
ject and will call the necessary subroutines in order to menu 

process it appropriately. Objects that are common to input object: screen — multi-object list 

multiple modules in an application can be of different input object: screen — message object 

types in the different modules. Within a module, an input object: conclusion— import 

object may be assigned multiple types (for example, a 30 input object: conclusion—conclude a value 

message object may also be designated an output object, output object — list ID to file 

so that the message text is output to a file after process- output object— list text attribute to file 

ing). output object — export object value to file 

(5) SEQUENCE: The initially preferred sequence for An object can be designated as being of more than 
evaluation of each of the objects within the consultation 35 one type. For example, a message object is designated as 
is specified by the developer. Developer sequences are a screen object type and will appear on screen as an 
respected where possible, to allow maximum control input object during the consultation, and it may also be 
over the order in which onscreen objects are presented designated as an output object, so that its text is used for 
to the user. However, if an inconsistency is discovered some purpose at the end of the consultation, 
between a developer-defined sequence and an object 40 . 

dependency relationship when the Generated Data File Input Objects 

is being generated, objects will be resequenced by the The class of input objects is divided into two sub- 
Generator Program 8 as required in order to satisfy the classes, screen objects and internal conclusion objects, 
object dependency relationships involved, i.e., ensuring Screen objects ask questions to elicit information from a 
that all objects on which a given object depends pre- 45 user of the system, and may require direct data entry or 
cede it in the resulting sequence. a selection from a menu of alternatives. They may also 

(6) TEXT: The text of a question or recommendation post messages on screen, to advise the user and make 
or message that is concluded to be applicable if and recommendations. Internal conclusion objects acquire 
when the object fires. values according to predefined settings for the Possible 

(7) POSSIBLE VALUES: Possible menu choices 50 Values attribute, which may refer to the values of other 
that might be presented to the user when a question is objects, or by importing values from external sources, 
asked on screen, or alternative values that might be such as a database file. 

concluded by an object, including constants, literal 

strings, mathematical formulas to be evaluated, default Screen objects 

values, or references to other object values. 55 These are objects which acquire a value supplied by 

(8) DATA TYPE: The data type of the object's ex- the user. Text appears on screen, and the user supplies 
pected value to be concluded. an answer. The text of a screen object can contain em- 

(9) LINKS: Instructions for interactions between the bedded values, including live data from the current 
object and other entities, such as importing a value from consultation. There are four types of Screen objects: 
an external data file, exporting data to such a file, or 60 user-input, select-one, messages, and multi-object lists, 
calling a subroutine. (1) User-input screen objects 

(10) RELEVANCE CRITERIA: A set of statements These objects require the user to type in an answer, 
to be examined when the object is evaluated. If, upon For example, if the consultation requires a figure for the 
evaluation, the relevance criteria are satisfied, the ob- dollar value of a contract, the knowledge base should 
ject will "fire" at the appropriate time in the sequence 65 have an object that asks the user something like **What 
by taking the actions of performing the appropriate is the value of the contract? " in order to acquire this 
action for the object type, setting a value for the object, value. The text of the question to be displayed on screen 
and performing any specified links. is stored as the text attribute of the object. Another 
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stored attribute specifies the number of spaces or col- jects is maintained in memory. The final object in a 

umns on screen that the system should provide for the multi-object list has an attribute which indicates that it 

user to type in the answer to the question. is the last object After processing this terminating ob- 

(2) Select-one screen objects ject, the resulting list of applicable objects is presented 
These are objects which ask the user to select one 5 to the user. 

answer from a pop up menu of possible choices. The 

text of the question is displayed and the menu of choices Internal conclusion objects 
appears, allowing the user to make a selection. Unique Internal conclusion objects acquire values just as 
menu alternatives are stored in the Expression Data Hie screen objects do, but they do it transparently, without 
44, and pointers to such values are stored in the Object 10 the involvement of the user, based on previously en- 
Data File as the Possible Values attribute of the object tered information. Conclusions can have one or more 
In this way, menu alternatives are only stored once, and possible values to conclude or calculations to perform, 
may be reused by other objects by using pointers to the as specified by the Possible Values attribute, and may 
alternatives. use the values of other objects in the application. Con- 
Menu choices can be suppressed from appearing on a 15 elusion objects may adopt the value of another object, 
menu by assigning action criteria to the menu choice. perform a calculation using other objects* values, accept 
At runtime, these criteria will be evaluated when build- a literal string as a value (for example, "YES" or "Sell 
ing the menu, and if the criteria are not satisfied, the the stock"), call a custom subroutine to assign a value, 
choice will not appear. This allows menus to behave in or import a value from an external file, 
a context-sensitive manner, responding to the particular 20 An internal conclusion object may contain an expres- 
facts of a consultation, so that irrelevant choices are not sion to be evaluated, and the result of such expression 
offered to the user. Where no action criteria are sped- becomes the value concluded by the internal conclusion 
fled for a menu choice, the choice always appears on the object Unique possible values or expressions are stored 
resulting menu. as records in the Expression Data File 44, and pointers 

(3) Message screen objects 25 to these records are stored as the Possible Values attri- 
These objects are displayed on screen, but they are bute of the internal conclusion object. Potential values 

not questions and do not accept any response from the to be concluded may have specified action criteria, 

user other than pressing a key to continue. Message which are evaluated in the order that the values are 

objects are used to display information to the user or to defined. The first successful value is taken as the value 

post notices that call the user's attention to some fact or 30 of the object, and the remaining values or expressions in 

result The information displayed could be advice, rec- the list of possibilities are not considered. This allows 

ommendations for actions, or the interim results of a possible values to be prioritized. Where a potential 

particular analysis that has been performed by the sys- value has no action criteria specified for it, the value is 

tern. Frequently, a message object will have the current always concluded. 

values of one or more other objects embedded in its 35 Internal conclusion objects may alternatively import 

text. data from another source, instead of using predefined 

Message objects are quite useful for communicating values. Instructions for performing the import are as- 

the progress and the results of the system's analysis as it signed as an attribute of the object These instructions 

proceeds during the consultation. Additionally, they are include the name of the data file to be used, an associ- 

useful aids for developers when debugging the knowl- 40 ated index file name, an expression to be evaluated to 

edge base. Often, message objects are also designated as serve as an index key for locating the desired record in 

output objects, so that their text is asserted as applicable the external file, and an expression to be evaluated to 

in some fashion at the end of the consultation, e.g. in a acquire the value to be imported (usually comprised of 

report containing consultation results. If they are so one or more fields in the external data file). Imports may 

designated, their formatted text (including any embed- 45 have action criteria defined for them. At the time the 

ded values) is preserved to become the object's text as import is to be performed, such action criteria are evalu- 

an output object ated and the import is performed only if such criteria 

Message objects do not acquire values per se, but are are satisfied, 
assigned a value of "< System Message >" by the sys- 

tern. This allows them to be viewed and recognized as 50 Output objects 

messages when the user is reviewing the values of ob- Output objects are a powerful way for the system to 

jects that have been processed, and allows their values report answers or conclusions drawn during the consul- 

to be referenced by other objects if desired. tation. The knowledge base can be configured to evalu- 

(4) Multi-object lists ate the applicability of various possible alternative state- 
A multi-object list presents the user with a list of 55 ments or recommendations, based on entered facts, and 

items, each of which represents an object within the assert them in some fashion at the end of the consulta- 

system, and allows a "mark all that apply" approach, tion. 

instead of requiring a single choice from a menu. In a _ 

multi-object list, each item presented on the list of System objects 

choices is an object in its own right Each object has its 60 System objects achieve basic system operations for 

own relevance criteria which determines its applicabil- program control in special situations. If the consultation 

ity. If these criteria are satisfied, the object will appear should be terminated and started over, a "reset" system 

on the resulting list; if not the object will not appear. object can cause the values of all objects to be released, 

Objects to be included in a multi-object list have as an and the system to be started again in its initial state, 

attribute the number of the particular multi-object list to 65 System objects can cause other modules to be run, with 

which they belong. During the consultation, such ob- control then returning to the calling consultation, and 

jects are processed in the order encountered to deter- can cause the consultation to jump to a specified object 

mine their applicability, and a list of the successful ob- under given conditions. 
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Application Level Attributes 

Every Object Data File 40 contains an application 
record in addition to object records. The application 
record contains control information and attributes 5 
about the application as a whole, and attributes for each 
module, as distinguished from object level attributes. 
For example, the name given by the developer to Mod- 
ule 6 is a module level attribute. It is the text string that 
the system will extract and place in the upper left corner 10 
of the screen when executing the consultation for Mod- 
ule 6. 

The application record is distinguished from regular 
objects with an "A" in the OBJECT TYPE attribute. 
The NAME attribute for the application record con- 15 
tains a numeral (for example, *1"), which is the applica- 
tion number. 
The application record stores: 
the title of the application as a whole, 
a version number for the entire knowledge base, 20 
the names of application-specific external files which 
should be opened at the outset of a consultation and 
left open to facilitate the exchange of data between 
the system and such external files, 
the name of a default global help file for the applica- 25 

tion, and 
module-specific attributes 

Module-specific attributes in the application record 
are: 

the module number, 30 

the title of the individual module, 

a version number for the module, 

the date and time of the most recent generation of the 

module's Generated Data File 14, 
the name of a procedure file to be opened, to allow 35 

access to custom subroutines to be used by the 

module, 

the name of a module-specific help file, 
instructions used when looking for restorable consul- 
tations, and when saving consultations, (These 40 
instructions include the name of an external file 
where consultations are stored, expressions used in 
order to construct a menu from which to select a 
particular consultation to be restored, and an ex- 
pression used in order to construct an index key to 45 
be used at the end of the consultation to locate the 
appropriate record for consultation storage.) and 
instructions for constructing and refreshing a status 
display which is shown in the upper right corner of 
the screen during a consultation. 50 

Relevance Criteria 

Knowledge is represented in the system by the mean- 
ings of the objects that axe created, and additionally 
through the use of a fundamental attribute of every 55 
object: its defined collection of "relevance criteria", 
which consists of one or more declarative statements. 
Statements take the form of a left-side expression of any 
complexity, a conventional operator, and a right-side 
expression of any complexity, and can be evaluated for 60 
their truth value. These relevance criteria statements 
may refer to the values of other objects in the system, 
thereby creating inter-object dependency relationships. 

Multiple statements in the relevance criteria set for an 
object are linked by the Boolean operators "AND" and 65 
"OR". Statements may be grouped algebraically in an 
arbitrarily complex fashion, with nested parenthetical 
groupings allowed, and may be assigned a required 
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14 

certainty factor which must be satisfied in order for the 
statement to evaluate to "true'* (see "Certainty Factors" 
below). The resulting set of statements in the relevance 
criteria collectively form a complete description of the 
circumstances under which an object should be consid- 
ered relevant and applicable to the analysis which the 
expert system is designed to perform. 

For example, if X is an object, a typical statement in 
its set of relevance criteria might be: "A>B". This 
creates a dependency relationship between X and the 
objects A and B, for X's applicability in the system will 
be influenced by the values of A and B. The statement 
can be evaluated for truth, using the values of A and B 
and applying the operator to compare them. 

The groupings of statements, possibly nested, and 
alternatives among them (signified by a linking "OR" 
Boolean operator) are expanded by the Generator Pro- 
gram 8 during the creation of the Generated Data File 
14 in order to form distinct, alternative, enabling "path- 
ways" for the success of the criteria. All statements 
within such a pathway are connected by "AND", so 
that all statements must evaluate to "true" in order for 
the pathway's requirements to be satisfied. Any one 
such pathway for an object— when all of its statements 
evaluate to "true"— is sufficient to designate the object 
as relevant and applicable. If, upon evaluation, at least 
one of an object's enabling pathways does succeed and 
the object is therefore considered to be applicable, it is 
said to **frre". When an object fires, the consequence is 
simply that the Object Processing Program 9 ought to 
take appropriate action with it What the object does 
when it fires is determined by its other defined attri- 
butes* 

Certain objects in an application may always fire. For 
example, some initial data must always be collected in 
order to begin the analysis. Such objects are treated as 
special cases and are not assigned any relevance criteria. 
The absence of relevance criteria in this context is an 
indication to the Object Processing Program that the 
object should always fire. 

The concept of relevance criteria is extended in the 
system beyond its fundamental application of describing 
the circumstances under which an object should fire, to 
describing the circumstances under which particular 
attributes of objects should operate in alternative ways. 
Such criteria sets are called "action criteria". For exam- 
ple, in the case of an internal conclusion object having 
some number of alternative values, each value will typi- 
cally have, as an attribute of the object, a cluster of 
action criteria that specify the requirements for that 
value to be used. The action criteria statements for a 
given value are expanded if necessary to identify alter- 
native, stand-alone pathways, any one of which is suffi- 
cient to cause the value to be assigned (concluded), and 
in each case the action criteria statements form a set of 
instructions for making the binary decision of whether 
to utilize the value or ignore it By convention, the first 
successful value in a list of alternative values will be 
employed, and the rernaining members of the list will 
not be evaluated. This allows the developer to prioritize 
the conclusion alternatives by organizing them in a 
preferred sequence. 

In the case of a screen object with alternative menu 
choices, action criteria (if any) associated with a partic- 
ular menu choice must be satisfied in order for that 
option to appear on the resulting menu, else it will be 
omitted. By convention, in this context all members of 
the list of values are processed, and values having no 
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associated action criteria always appear in the resulting nosis one object's text might say "The available evi- 

menu. deuce indicates that this patient has Parkinson's dis- 

Internal conclusion objects may also acquire values ease." The following object's text might say 'The avail- 
by importing data from external sources, such as a data- able evidence indicates that this patient probably does 
base file. Action criteria may be specified to describe 5 not have Parkinson's disease." Clearly, these two ob- 
the circumstances under which such an import is to be jects should not both fire, and would have mutually 
attempted exclusive relevance criteria. In the context of a particu- 

Other examples of extended uses of the criteria con- lar set of facts, if one of these objects fires and is there- 

cept include describing the circumstances under which: fore asserted as applicable, its counterpart should not 

(1) a specified procedure or subroutine should be 10 fire. Depending on the relevance criteria specified, it 
executed might also turn out that neither object will fire, perhaps 

(2) one of several alternative phrasings of text should because a previous object has ruled out the possibility of 
be employed Parkinson's disease altogether. 

(3) a value from one or more other objects should be ... „ . 
dynamically embedded in the text of a given object to 15 ^ l able View of ^ ntCTm 

form a customized message. Criteria statements are specified by the developer of a 

Processing all objects and firing them on the basis of knowledge base using what is called a 'table view". In 

an evaluation of their relevance criteria produces a this representation, each statement occupies a row in a 

subset of the total objects in the knowledge base, a list matrix. There are columns for the Boolean operator, 

of objects which are all known to be applicable (they 20 left-side parentheses for algebraic grouping, a left-side 

successfully fired). This list can be thought of as a col- expression, a connecting operator, a right-side expres- 

lection of objects in the knowledge base which is cus- sion, right-side parentheses, and a certainty factor 

tomized to fit the current fact pattern. The input objects threshold for the statement. Left-side and right-side 

in this set of successful objects have acquired all rele- expressions in a statement may consist of any syntacti- 

vant information and have reached appropriate conclu- 25 cally valid expression in the language being utilized, and 

sions. All output objects in the set specify output actions often contain references to other objects. An object 

that have been taken, such as asserting their text as reference is specified by the developer by using the 

answers and recommendations, and many others. object's name, and enclosing the name in curly brackets 

Output objects are usually blocks of text (of any ("G"). 

length), and many expert systems are designed specifi- 30 All criteria statements (except the first) must be 

cally to produce such text The text could be something linked to preceding statements by a logical operator, 

like "Check to see that the unit is plugged in.", or it either "AND" or "OR". Parentheses are used for alge- 

might say something more significant, such as 'Tire the braic grouping of statements, and may be nested up to 

missile and start the war." These are essentially just the five levels deep. Parentheses are also allowed within the 

communication of a result to the user of the system. 35 expressions used in a statement, to permit normal alge- 

"User" in this sense might be another system, rather braic grouping within such expressions. Certainty fac- 

than a human, and so the output object may indeed tor thresholds for individual statements are optional, 

cause an action, e.g. causing some block of code to be Where omitted, the statement is treated as having a 

executed, such as a subroutine, or an instruction to 100% certainty factor (see "Certainty Factors" below), 

return a value to a calling program. 40 There are eight allowed connecting operators in 

Alternatively, actual work product might be the goal: statements, and each is assigned a number, as follows: 
e.g., an "intelligent document assembly" program might 
produce an actual document, such as an insurance pol- 
icy customized to the facts, risks etc. involved in a par- 



Number Operator Meaning 



ticular person's application for insurance. It might per- 45 1 = 

frtrm fh*» analvoia aev*nrAintr tr\ /)m->i«irm wiiIm an A r»rvm- ^ " OOCS OOt equal 



3 > is greater t 



form the analysis according to decision rules and com- 

pany policies which are represented through objects* 4 < i, than 

relevance criteria, and build the document using stan- 5 >= is greater than or equal to 

dard company language, embedding data values at spec- 6 < = is less than or equal to 

ified places in the text In such a case, what may happen 50 3 £ ? ~ S!ntain* wiSring 

is that the system should simply use the text of the sue- ^ _____ — 
cessful objects to create the resulting file directly. Alter- 
natively, the objects may just consist of pointers to files The last two operators work with comparisons of 
or blocks of text in other systems, and this system may character strings. The operator evaluates whether 
just list the identifiers of the successful objects to a file. 55 *-e left-side expression, comprised of a character string, 
A follow-on program might pick up the resulting file » contained within the right-side expression, which is 
and look to other resources to build the end product also a character string. Thus, the statement 

All objects on the list are considered, and based on "bed" $ "abede" 
their relevance criteria, some objects are listed as appli- evaluates Jo "true", and the expression 
cable under the known facts while others are rejected as 60 "xyz" ® "abode" 
inapplicable. also evaluates to "true". 

Often a knowledge base will include several objects Here is a generic example of a complex set of rele- 
to represent alternative outcomes for a given concept or vance criteria for an object: 
result For example, in an application for medical diag- 

Logical Left Left Connecting Right Right 
Operator Paren Expression Operator Express. Paren 

a) {OBJECT1} = {OBJECT2} 
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{OBJECT9} 


> 


0 
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{OBJECTIO} 
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The letters in the left column are labels designating the 

different criteria statements, to facilitate discussion. attribute specifies which firing bias to use, so that the 

When the Generated Data File 1* is generated by Object Processing Program 9 behaves appropriately. 
Generator Program 8 f such criteria statement sets are _ There is no inherent reason to prefer one firing bias 

evaluated and expanded to form standalone "pathways" over the other. In fact, overall system storage require- 

for the independent ways that the criteria may be satis- ments are minimized if the two biases are both used and 

fled. In this expansion process, individual statements are applied on an object by object basis, so that some ob- 

treated as indivisible units, and are manipulated to ere* jects have one firing bias while other objects have the 
ate pathways by distributing common statements across _ opposite firing bias. In this alternative embodiment, an 

parentheses groupings. The resulting pathways no object-level attribute specifies the object's firing bias. If 

longer contain "OR" terms, or left and right parenthe- an object typically will fire in most cases, it may be 

ses to group statements (although within a statement, more economical to represent the knowledge about its 

any parentheses within an expression will remain un- behavior by enumerating the circumstances under 

changed). which it should not fire, i.e. by giving it a bias toward 

In the example above, there are three possible ways firing. Alternatively, if an object generally should fire 

that this object's criteria could succeed, causing the only in specified circumstances, it may be easier to iden- 

object to fire. Using line labels to represent the state- tify those circumstances and give it a bias against firing, 

ments, the resulting pathways are: However, the mixture of two firing biases introduces 
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significant maintenance problems in the system. Devel- 
opers of a system find it much easier to deal with a 
consistent firing bias, and confusion can therefore be 
avoided by adopting a single firing bias and applying it 
consistently throughout the system. 

35 Storage of Criteria 

When specifying criteria for an object, the developer 
creates evaluatable statements. As shown in FIG. 5, 
unique statements that have been created are stored in a 
Firing Bias ^ separate table (the Statement Data File 42), and are 

— . _ - . . ^_ _ assigned a unique identifying number, using the same 

The collection of relevance cn em statements for an i^^^^^^l ^ object Data F3e40. 
object has an miphcit result that mcorporates default Left-de ^ sions ^ state . 

f"^ for T bma f <* OIce ° f wl f* er ^ ° b J<f ments are stored in the Expression Data File 44, again 
should fin. In the preferred >qtam this ; result ^ ^ identifier n Xber. 

is that the cntem succeed, then the object should fire. 45 (^^ lets are stored in the Object Data File. State- 
As presented, the default a always not to fire the object, meQts ^ m ^ ^ b different ob . 

and the relevance criteria define the circumstances . . _, J- ^ ± ± * , 

, . v *u- j r i* w i. • i_ i j ^^^^77j jects, and expressions used in statements may be reused 
under which this default behavior should be overridden i a;&-~>~+ r^™^™+K, M *„ ™ 

j « • . , . - r-*. . - , . by Qinerent statements. Consequently, statements are 
and the object caused to &c This is referredto as the ^ ^ referenced m ^tely from the 

-firing bias of the objeefcthe ^system isb^agamst 50 objec ts which use them, and expressions are stored and 
firmg objects, and specked enter* must be satisfied m fenced separately from the statements which use 
order to overcome this bias and affirmatively assert the them. 

applicability of the object The identifying numbers in each file serve as pointers 

Extensions of the criteria concept withm the sys- ^ fl^ee data files. Criteria sets in the Object 

tem-for example me of action enter* to control 55 Data | ae contain pointers to the applicable statements 

£S ^ ^ ° r °2 ™?° n in the Statement Data File, using the statements' identi- 

should beattempted-use the same "af&matrve" ap- fiers A ^ ^ j, ^ onl m ^ 

proach: action enter* specify the condition under ^ ^ D fa When it is used within a state- 

wh^somethmg should happe* with the default bemg me ^ e statemeQt records ^ ^ by ^ ^ CT _ 
that the action shouldnot be performed. 60 ideQtijler ^ a mter j* 0 £ e E ^ ression 

An alternative embodiment of the invention incorpo- jL^ Ffle Because a ^Liit d -o-^^L of a 

rates the opposite default behavior. In such a system, 71 ^ Bec ? nse a statement always consists ot a 

... c . _ . * "J "~T . * v left-side expression, an operator, and a right-side expres- 

the firing bias is reversed, so that an object will always „ *^»a „<■ « , 

fire unless its relevance criteria statements are satisfied, ? «^ ed J e P r f sentab o n of statement as a 

in which case the object will not fire. Thus JhTiX 65 ^ numenC Stmg 15 P 085 * 1 * 
vance criteria become a description of when not to fire <« pre ssio n identifier <operator 

the object, rather than a desenption of when to fire it In nmnber> < expression identifier 

this embodiment of the invention, an application-level 
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tj v a v/tdt c W Ai,<,n lw , 061 for *-* new module, this data is added to the Object 

EXAMPLE: 042530379 Data File's 40 apphcation record. 

In this example, "0425" is the identifier for the left- Objects are always created and edited within the 
side expression. The expression itself may be retrieved context of a particular module within an application. In 
by locating identifier "0425" in the Expression Data 5 order to work with objects, the developer opens the 
File and retrieving its stored expression, which might be application and module of interest by selecting from a 
something like "{OBJECT ^3} + {OBJECT— .09}". The menu. "Opening^ an application causes its files to be 
"3" in the fifth position of the string stands for the third placed in use, and "opening*' a module causes indexing 
connecting operator, ">". Finally, the "0379" is the information to be updated for all objects in the applica- 
right-side expression's identifier, which when located in 10 tion's Object Data File 40. The ATT field of the Object 
the Expression Data File might yield an expression such Data File contains all attribute data for each module of 
as "100,000". The complete statement therefore ex- which an object is a member, (see FIG. 5.) Those ob- 
pands to become: jects which are members of the module being opened 

have certain attribute data extracted from the module's 
{object_63} + {object_09} > ioo t ooo 15 ^ segment in the ATT field, and this data is placed in 

the ATX field for use in indexing. Objects which are 
This is a statement which can be evaluated for truth, not members of the module being opened will have a 
using values acquired during processing of the system. blank ATX field, for there is no data from the ATT field 
Such a statement can become one of an arbitrary num- to extract and insert into the ATX field. This allows 
ber of statements in a given object's set of relevance 20 records with blank ATX fields to be filtered out, and 
criteria, and can be reused by other objects in their the resulting set of records which are in use and avail- 
criteria sets as welL The 9-digit string representing this able for editing are just those objects which are mem- 
statement in encoded form is stored in the Statement bers of the opened module. 

Data File, and is assigned its own identifier, such as When a module has been opened, an "action" menu 
"2947". Because the expressions within a statement are appears, giving the developer a choice of several alter- 
referenced by pointers to the Expression Data File, the native actions to perform. Options on this menu that 
reference to the value u 100,000" in this example can be relate to working with specific objects in the knowledge 
reused by many other statements as well. base are "Edit", "Create", and "Delete". Other options 

Objects store criteria in the form of pointers to state- 3Q on the menu offer various utility and diagnostic ser- 
ments such as the example above, using the statements' vices, such as a utility to create a report which lists the 
identifiers. If an object's firing behavior depended contents of the knowledge base for this module. Other 
solely on the example statement above, its relevance possible options are the Validation Table Generator 5, 
criteria would consist of the string "2947", i.e. this state- the Natural Language Interpreter 6 and the Explanation 
ment's identifier. Boolean operators which link state- 35 and Diagnostic Utility 7. Another option on this action 
ments in criteria sets are represented by an underscore menu is "Create the Generated Data File", which in- 
("— ") for "AND", and a period ("•") for "OR". Paren- vokes the Generator Program 8 to put information from 
theses used to group statements in a set of criteria are the finished knowledge base in a file format suitable for 
stored literally. Thus, suppose object X had the follow- processing the module as the intended expert system, 
ing criteria: 40 Such processing will be performed by the Object Pro- 

cessing Program 9. 

_____________________ To edit the attributes of an object, the developer 

< statement 1, having identifier number: 0123> selects "Edit" from the action menu and is then pres- 

AND ( <statement 2, having identifier number: Q234> ented with a menu Of existing objects in this module. 

OR < statement 3, having identifier number: 0545 > ) _ . „ , , i 
— ' * uvmocr. __ 45 The developer has options for what objects are pres- 

ented and for how this menu should be displayed. For 
This set of statements would be stored in the Object example, the developer might direct the menu to dis- 
Data File as the character string: play only internal conclusion objects, and to display 

them in alphabetical order rather than in their defined 
om_(0234.o345) 5Q sequence order. 

Selecting an object from this list opens that object for 
Other means of representing statements and expres- editing, and causes a menu of object attributes to ap- 
sions are certainly within the scope of the invention. The developer selects the particular attribute to be 

For example, storage requirements for the knowledge edited for this object Depending on the attribute in- 
base could be reduced by using a different pointer con- 55 volved, an appropriate interface is provided to facilitate 
vention, where pointers are of shorter length. the editing. For example, if the text of a screen object is 

Creating the Knowledge Base to be edited, then a window opens showing the text that 

is currently assigned as the object's text attribute. The 
The preferred implementation provides a computer developer can make changes to thfa text and then save 
program (the Knowledge Base Development Program the new version. 

4) to facilitate the creation of applications, modules, and To create a new object, the developer selects "Cre- 
objects, and the editing of their attributes. A developer ate" from the action menu, and must then enter the 
who wishes to create a new application selects "Create proposed name and sequence position for the new ob- 
New Application" from a menu, and after certain infor- ject The system then checks to see if the object already 
mation such as the application name and number are 65 exists. Several variations are now possible: 
entered, the program creates the necessary files. A new (1) If the object does exist and it is already a member 
module for an existing application is created in a similar of the module being edited, the creation action is re- 
fashion: after the developer provides a name and num- jected, for duplicate object names are not allowed 
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(2) If the object exists as a member of one or more another object's value). Each of these strings or expres- 
other modules, the effect of the creation action is to sions is stored as a value in the Expression Data File, 
include it as a member of the current module as well. Each unique possible value in the Expression Data 
The assignment to the object of at least one attribute for File is assigned a unique, 4-digit, numeric system identi- 
the current module is enough to make it a member of 5 tier, and this identifier is stored as a value designator in 
the module. In this situation, the developer has the the object's ATT field, serving as a pointer to the corre- 
opportunity to copy the attributes of the object that spending entry in the Expression Data File. In this way, 
have been assigned for purposes of another module to possible values may be stored only once, and may be 
become its attributes for the current module as well, reused by other objects in the knowledge base, 
saving repetitive data entry. 10 To facilitate editing the relevance and action criteria 

(3) If the proposed object does not yet exist, a new of 311 object, an interface is provided for the "table 
record is added to the Object Data File 40 for it and a view" of criteria, to support the entry and manipulation 
new system identifier number is assigned to it of criteria statements. In this interface, each statement 

After an object is added to the current module, the occupies a row on the screen, and the rows are divided 
normal editing process for its attributes begins. 15 mt0 columns according to the columns in the table view 

Object deletions are handled in a similar manner. The ( m ^ entitled. The 'Table View" of crite- 

developer chooses •'Delete" from the action menu, and ria ')- After ^ developer edits and chooses to save the 
then selects from a list of current objects the object to new vendon of *** toble ™ w > statement set is stored 
be deleted. If the object is a member of the current 88 *** criteria for Relevance criteria for an 

module only, its record is removed from the Object 20 a attribute in their own right, while 

DataFilcIfitisamemberofoneormoreothermod. action « stored together with the data de- 

nies as well as the current module, the developer is m ^ts to which they relate 

askediftheobjectshouldb^removedfromaUmodules, To facilitate the reuse of statements, and of the ex- 
or only from the current module. If it is to be removed „ ^ ous ° f 
from all modules, the record is deleted from the Object 25 ;» "fS^ ttou^ ^e use of pomters to &e 
Data File. If it is to be removed from this module only, ^ X ™T ^ Sff °° * T / 

its attribute data for this module is deleted, but the ob- ™? f^/f Dat f ™ e * a ^ le 

. p , . . . . , _ ^ * j * r *v that have been entered into the system, each with a 

jeers record is retained, and its attribute data for other „ . „ A A . 

j i _ . , unique, 4-digit identifier, 

modules is unaffected. 3Q ^ 

When an object is added or deleted from a module, Creating the Generated Data File 14 
the developer^defined positional sequence numbers for To generate a usable system from the knowledge base 
aU objecte m the module are adjusted to account for the that has been created, the objects in the knowledge base 
change. When an object is deleted, all objects which m organized by ±e Generator Program 8 into a linear 
contain references to it are visited and such references 35 se queilce for processing, using a conventional topologi- 
are removed, m order to niaintain referential integrity ^ ^ algorithm. This sequencing operation is 
among objects. If the deleted object was a member of based on identifying and taking into account all inter- 
other modules as well, and it was deleted from all mod- ob ject dependency relationships that have been created 
ules, all objects in those modules must also be visited in by ^ statements used in the objects' criteria sets, and 
order to remove references to it If it was deleted only 40 by references to objects in other attributes (for example, 
for purposes of the current module, this process is lim- ^ valuc ^ ^ dynamically embedded directly 
ited to just those objects within the current module. mt0 ^ ^ of another object, creating an inter-object 

Most attribute data that is entered for an object is dependency relationship without the involvement of a 

stored directly in the object's ATT field in the Object criteria statement). When the resulting system is exe- 

Data File. For example, if a conclusion object is de- 45 ^ted, all objects are considered in sequence. The final 

signed to import data from an external file, the instruc- sequence position of an object becomes an additional 

tions concerning the import operation to be performed attribute of the object 

are stored as a character string in the ATT field. If any The notable feature of the resulting sequence is that 
action criteria are specified for the import operation, all objects on which a given object depends (called its 
such criteria are stored together with the import in- 50 list of "upstream" objects) will precede it, and all ob- 
structions. Certain types of attribute information, how- jects which in turn depend on the given object (its 
ever, are stored in the ATT field in the form of pointers "downstream" objects) will follow it When sorting the 
to records in the other files of the knowledge base, the objects into their final system sequence, they are consid- 
Statement Data File 42 and the Expression Data File 44. ered in the sequence order specified by the developer. 

If possible alternative values for an object are prede- 55 Therefore, the developer-defined sequence is respected 

fined, these values are stored in the Expression Data to the extent possible. Only if an object must be moved 

File, which is a repository for all unique values that to a different position in recognition of an unsatisfied 

have been identified in the knowledge base. For exam- inter-object dependency will the developer-defined 

pie, a screen object might ask the question "What is the sequence be disrupted. If an object must be moved, it is 

type of purchase contract involved?" and supply a 60 because it depends on some later object, and so it must 

menu of predefined choices for the user to select be moved downward in the sequence to a position after 

among. Each of these predefined choices is a possible the object on which it depends. It is inserted into the 

value for the object, and each is stored in the Expression sequence position immediately following the required 

Data File. Similarly, an internal conclusion object object 

might have four alternative values that it might con- 65 An implication of this sequential sorting of objects is 

elude under different fact situations, and these possible that incidences of circular reasoning may be identified 

values could be either literal strings (e.g. "YES") or by the system. If an object cannot be placed in the re- 

evaluatable expressions (e.g. one object's value added to suiting sequence so that the sorting condition is me- 
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t— i.e., so that all objects on which it depends precede it, 
and all objects dependent on it follow it— then there is 
some circular element to the object dependency rela- 
tionships. For example, suppose object X depends on 
objects A, B and C, and objects D, E an F depend on X. 5 
Now suppose that object B depends on object F. X must 
be sequenced after B, F must follow X, yet B must be 
sequenced after F. Such circular references indicate a 
fault in the system logic, and must be resolved before 
the system may successfully be generated. 10 

In the preferred embodiment, once the objects have 
been sequenced, a Generated Data File 14 is generated 
to hold all of the necessary data about the objects and 
the statements. (See FIG. 6.) The addresses of those 
bytes in the Generated Data File where each set of IS 
necessary data about each object and each statement 
begin are recorded in control strings (discussion below), 
so that when data is required about a statement or ob- 
ject, the Object Processing Program 9 can go to the 
appropriate Generated Data File address and retrieve 20 
all necessary data. This Generated Data File is accessed 
at runtime using low-level file functions, which allow 
direct manipulation of the file pointer. Certain applica- 
tion-level and module-level attribute data (such as the 
date the file was generated and version numbers) are 25 
placed in a Header string at the beginning of the Gener- 
ated Data File. 

An initial state of the system is also generated, in 
which object firing status and statement truth values are 
recorded in their initial states. In general, all statements 3° 
are considered to be false in the initial state, and all 
objects are considered not to fire. However, there are 
exceptions to this general rule. Some objects always fire 
(for example, in order to collect initial data that is al- 
ways required). By convention, such objects will have 35 
no assigned relevance criteria, and the initial system 
state calls for these objects to fire. Some statements are 
true at the outset; for example, suppose a statement, 
referring to the value of object X, reads: 

40 

X= (Lc the value of object X is the null string) 

This statement is true in the initial state of the system, 
for object X has not yet acquired a value. The initial 
system state will record a truth value of "true" for this 45 
statement The Statement Queue Array 34 (see "State- 
ment and Object Queue Arrays" below) will be pre- 
loaded with references to these statements at the start of 
processing 110. 

As show in FIG. 6, the structure of the generated 50 
Data File is: 

< Header information > 

< Statement data> 
<Object data> 

< Control strings> 55 

Statement data is represented as follows: 
<the statement's identifier > 

< generated code for the statement > 

<list of pointers to objects that can be affected by the 
statement's truth value > (each pointer to an object is 60 
coupled with the sequence number of the object's 
designated controlling object (discussion below)) 
Object data is represented as follows: 

<the object's identifier > 

<all object attribute data, separated by attribute codes 65 
(see below) > 

< list of pointers to obj ects that can be directly affected 
by the object > (each pointer to an object is coupled 
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with the sequence number of the object's designated 
controlling object) 

<list of pointers to statements that can be affected by 
the object > (each pointer to a statement is coupled 
with the sequence number of the statement's desig- 
nated controlling object) 
Control strings (see discussion below): 

<Object Firing Control String > 20 

<Statement Truth Control String>22 

<Default Value Control String>24 

<Object Address Control String > 26 

< Statement Address Control String > 28 
Attribute codes are internal separators in an object's 

attribute data that mark divisions between attributes. 

Each consists of a delimiter, followed by a code that 

describes the nature of the attribute whose data follows. 

If the text of an object was "Enter the amount of the 

mortgage:", its attribute data in the Generated Data File 

would include the segment: 

. . .< delimiter > <codc> Enter the amount of the 
mortgage: < delimiter > <next attribute's code and 
data>.. . 

The Object Processing Program 9 

The inferencing mechanism works by considering 
each object in a defined sequence. In its simplest form, 
each object is examined in turn, beginning at the start of 
the defined object sequence, and continuing until all 
objects have been evaluated and the sequence ends. 
When an object is evaluated, its relevance criteria are 
evaluated to determine if the object should fire. If not, 
the analysis moves on to evaluate the next object in the 
defined sequence of objects. If the object does fire, 
action is taken according to its defined attributes. A 
question may be asked of the user, a conclusion drawn, 
an importation or exportation of data performed, a sub- 
routine executed, a message displayed, or a system vari- 
able updated. The object acquires a "value**, and this 
value is recorded, to be used in the evaluation of criteria 
statements employed by subsequent objects. Typically, 
the values that are acquired by objects are stored in a 
temporary buffer or data file (the Object Values File 
30), in which objects may be looked up and their values 
retrieved. (See FIG. 9). 

Evaluation of a criteria statement occurs by treating 
it as a line of executable computer code, and executing 
it to receive a logical value in return. For example, in 
FoxPro, such an evaluation may be performed using the 
operator (referred to as the "ampersand" operator 
or the "macro" operator). If X is a memory variable 
containing the character string "5>3", then the state- 
ment 

Y-&X 

evaluates to l< true" and stores a logical value of .T. to 
the variable Y. Without the ampersand (Le., if the state- 
ment read "Y=X"), the statement would simply store 
the contents of X to Y, Le. Y*s new value would be the 
character string "5>3'\ With the ampersand, X's con- 
tents are evaluated and the result of the evaluation is 
stored to Y. Alternatively, FoxPro also provides an 
EVAL0 function, designed to evaluate expressions, and 
this may be used in place of operator. 

In this fashion, a criteria statement may be stored to 
the memory variable X, and X may be evaluated to 
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yield a logical truth value. References in the statement 
to objects must be provided with the current values of 
those objects! however, and this can be accomplished in 
at least two different ways. 

(1) The values of the objects can be looked up in the 5 
Object Values File 30, and their literal values can be 
inserted into the statement in place of the object refer- 
ences, using string manipulation functions. For exam- 
ple, if the statement X takes the form "A>B*\ and A 
and Bare objects, each ofthem would be looked up and 10 
their current values retrieved. If A's current value was 

5, and B*s value was 3, then the characters "5" and "3" 
would be inserted into the code string in place of the 
"A" and "B" characters. Several character string func- 
tions could be used for this work; one of these is Fox- 15 
Pro's STRTRANO function, which translates instances 
of a given character in a string into another character: 

X— "A>B M (This is the initial version of the 
statement) 2Q 

(Now retrieve current values of A and B. A =5, and 
B=3.) X=STRTRAN (X,"A'\"5") (Replace every 
instance of "A" in X with "5") (X's value is now: 
"5>B") X=STRTRAN(X/'B'7*3") (Replace every 25 
instance of "B" in X with "3") (X's value is now: 
"S>3", and now X is evaluatable with the opera- 
tor) Y— &X (Evaluates the contents of X, executing the 
assertion **5>3") Y now has the value of .T. (logical 
"true"). 30 

(2) An alternative method of evaluating the statement 
avoids such string manipulation and creates new mem- 
ory variables, using the names of the objects. In the 
above example, when the current values of A and B are 
retrieved, two new variables, A and B, are created and 35 
these values are stored to the variables: 

A=5 
B=3 

Now the content of X (the string "A>B") is directly 
executable: 40 
Y=&X 

FoxPro (or whatever language is being used) will 
recognize the references to the memory variables A and 
B, and will substitute their values in the evaluation of 
the statement, to yield the logical "true". 45 

Because each object's relevance criteria statements 
are clustered with the object, the object can be consid- 
ered for the first time when it is reached in the sequence. 
At that time, its criteria are examined and statements in 
its enabling pathways are evaluated, using the current 50 
values of any objects referenced in the statements. If a 
pathway succeeds, the object fires, and remaining path- 
ways are not considered; if no pathway succeeds, the 
object does not fire and is ignored. An important conse- 
quence of this simple approach is that every object in 55 
the system will have its criteria evaluated. 

In the preferred embodiment of the method, how- 
ever, this simple approach of examining and evaluating 
each object in the sequence is made much more efficient 
through the use of dependency pointers and control 60 
strings. This approach allows the system to "look 
ahead" and efficiently propagate the implications of 
newly acquired information at the time such informa- 
tion is received. In this fashion, objects downstream in 
the analysis sequence can be "turned on", ie. designated 65 
to fire in advance of reaching them in the sequence, if 
information acquired by upstream objects warrants that 
action. When system processing reaches an object in the 
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defined sequence, its firing behavior will already have 
been determined. 

One important implication of this approach is that not 
all objects need to be evaluated. In general, objects start 
out not firing, and if nothing occurs upstream in the 
analysis to change that status, the object will continue 
to not fire and when reached it can safely be bypassed 
without an explicit examination of its relevance criteria. 
This can lead to significant improvements in system 
performance, for potentially thousands of non-firing 
objects could be bypassed with essentially no process- 
ing. 

Control Strings 60 

To support this approach, control strings 60 are gen- 
erated by the Generator Program 8 along with the 
system data to record the firing status of objects and the 
truth values of statements, and other required informa- 
tion. Several of these control strings are bitmaps (char- 
acter strings consisting of ones and zeroes, such as 
"001 10 10 110 ...")» a*" 1 some control strings store lit- 
eral data. They are stored in Generated Data File 14. 

One such control string (the Object Firing Control 
String 20) holds the firing status of objects in a bitmap, 
where a "1" means the object will fire and a "0" means 
it should not fire. Each object is represented in the 
control string as a single character, and objects are 
identified by their sequence numbers, which map posi- 
tionally into the string. Thus if the Object Firing Con- 
trol String begins with the series "1001000 . . . ", the first 
object in the sequence will fire, the next two objects do 
not fire, the fourth object fires, and the following three 
do not fire. This control string is constructed when the 
system is generated and the initial system state is de- 
scribed. There are as many digits in the string as there 
are objects; all objects are represented by zeroes, except 
for those objects that always fire, which are represented 
by ones. 

A similar bitmap control string is generated for state- 
ment truth values (the Statement Truth Control String 
22), in which statements that evaluate to "true" are 
represented by a "1" and statements that evaluate to 
"false" are represented by a "0". Here, since statements 
have no sequence, they are mapped into the control 
string using their unique identifying numbers. Thus, if 
the statement with an identifying number of "0147" 
evaluates to "true", then the 147th digit in the control 
string will be "1". When the initial system state is gener- 
ated, all statements are represented with "0", except for 
those statements which initially are "true", which are 
represented with "1". 

Processing in the system occurs by examining digits 
in the Object Firing Control String. The consultation 
begins by starting at the beginning of this control string, 
and terminates when the end of the string is reached. A 
pointer into the control string 112 is maintained to track 
the current position in the string (the object in this se- 
quence position is referred to as the "current object*'). 
At any point in the process, the pointer can be reposi- 
tioned to a different place in the string, if desired. For 
example, if a user flags an earlier object for review, the 
system can jump back to that object and reprocess it 
simply by changing the value of the Object Firing Con- 
trol String pointer. 

In the Object Firing Control String, if the character 
being examined is "0", the object in that sequence posi- 
tion is considered irrelevant and is passed by. If the 
character is "1", the object fires, and may acquire a new 
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value. This value change may cause changes to the retrieve the statement's data, the file pointer in the Gen- 
truth values of statements that refer to the object Such erated Data File is moved to this address and the data 
statements are flagged for evaluation, and this evalua- read into memory. 

tion occurs after each flagged statement's controlling Since knowledge about object behavior is repre- 
object (see below) is processed. If a statement's truth 5 sented through the use of statements, and these state- 
value changes, its representation in the Statement Truth ments create inter-object dependency relationships 
Control String will flip from a "0" to a "1" (if it goes wmc h are fully known once the knowledge base is 
from false to true), or from a "1" to a "0" (true goes to defined, pointers can be generated by the Generator 
fa3 ^' , , Program 8 along with other system data to indicate 

Changes to the truth value of statements in turn affect 10 which statements and objects can be affected by 
whether other objects downstream, whose behavior changes in the values of particular statements and ob- 
depends on the truth values of these statements, will 

fire Such objects are flagged for evaluation, and this Objects and statements are treated separately in this 
evaluation csxurs after each flagged object's control- pointer-based approach. Once the sequencing of objects 
hng object is processed If an object's firing behavior 15 ^ ^ accomplished, all inter-object dependency 
ctoges^therep^^ relationships have been identified Similarly: it is 

Firing Confrol String «0» to J (the knowii, from an examine ^ 

object now fires), or from "1" to (an object that was ments ^ m ^ system, which statement be af- 
previously designated to fire wiU now not fire). fected by a change 55 SSudl 

when ^v^ 6 T2/TL U u <f° ° reat £ 20 jectthus'acqu^ 

when the system is generated. A default value may be lv A „*„*™T^ *u 1 _ . «. . , , 

assigned as an attribute of an object, and this value must stetemen * whose ttuth values can be affected by a 

be assigned to the object during the consultation if the m * ™*£ C JTST^^r5 ^1 

object does not fire. Since an unfiling object will gener- ^ *° Z°7^^ ° f ^ oftht ^ Je ^ f whc f ^ 
ally not have its attribute data examined, and in most 25 i°r can be affected by a change in the truth value of the 
cases will not acquire a value, these special cases of s if 111 . n t 

default values must be recognized To provide this ca- u DunD * cxecutl ° n wdtmg ******* when 311 

pability, a Default Value Control String is generated y ^ ^ of statements that can be 

which duplicates the Object Firing Control String. In ^ lts v fl ue * counted* and each statement in 

this case, however, any M l" characters in the string 30 the list is flagged for a re-evaluation of its truth value. If, 
designate objects that have a default value attribute, u P° n such a revaluation, a statement's truth value 
which should be assigned to the object if the object does chafes, the list of objects that can be affected by its 
not fire. If an object is being bypassed because its char- 15 consulted, and each object in the list is 

acter in the Object Firing Control String is a M 0", the for a re-evaluation of its firing behavior. In this 

Default Value Control String is consulted to see if the 35 manner » Ae consequences of newly acquired facts are 
character for this object in that string is a " 1". If so, the propagated down the sequence of objects, causing new 
object data must be extracted from the Generated Data objects to be designated to fire when their turns come or 
File 14 and the default value assigned to the object at suppressing the firing of other objects that otherwise 
that time. would have fired. 

Addresses in the Generated Data File where object 40 When an object fires, it may have action criteria 
and statement data sets begin are recorded in an Object within its data which govern the behavior of attributes 
Address Control String 26 and a Statement Address ( for example, the criteria for concluding alternative 
Control String 28. Object addresses map positionally values, or the crtteria for importing a value). In these 
into the Object Address Control String by object se- cases, the action criteria are evaluated directly when 
quence number, Le. the address for the 42nd object in 45 needed during the processing of the object For exam- 
the object sequence will be the 42nd address stored in Pk> if a conclusion object fires and it has several alterna- 
tfae Object Address Control String. If the object in tive values, the first alternative's action criteria will be 
sequence position 42 is currently being processed, and it examined. The action criteria statements are stored as 
fires, the Object Processing Program 9 will need its statement identifiers, and for each such identifier, its 
data. To locate the data, the system will consult the 50 corresponding character in the Statement Truth Con- 
Object Address Control String and extract the 42nd trol String is examined. If it is a "1", the statement is 
address. This is the byte address in the Generated Data true under the current facts, and the next statement in 
File 14 for the start of this object's data. The file pointer the pathway is examined. If it is a "0", the entire path- 
in the Generated Data File is moved to that address, way fails, and the next pathway will be examined If no 
and the next sequence of data is read into memory. A 55 pathway succeeds, the value is not concluded, and the 
delimiter marks the end of each object's data set. action criteria for the next alternative value in the series 

Statement addresses are stored in the Statement Ad- will be examined. Typically, the final alternative value 
dress Control String 28 and are coupled with statement in the series will have no action criteria assigned to it, 
identifier numbers. Statements are not sequenced in the and this value will always be assigned to the conclusion 
system as objects are, and the Generated Data File for 60 if this value is reached in the evaluation of the alterna- 
a given module will likely use a subset of all unique tive values list 

statements in the application. Therefore, the most effi- It often happens that a user will interrupt the process- 
cient way to store statement addresses is by storing the ing of the consultation in order to go back to a prior 
statements' identifiers along with their addresses. Thus object to see a question or message again, and the user 
if a statement's identifier is "0165" the Statement Ad- 65 may change the response previously given to a question, 
dress Control String will be searched for this identifier, When this happens, the consultation must proceed from 
and the file address for the statement's data will be that point and must reprocess objects that it processed 
stored in the bytes immediately following this entry. To during the first pass. The reprocessing of objects that 
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have already acquired values creates a special situation other objects directly as well. This is called a Direct 

in the consultation. Object Link. For example, where an object's value is 

The changed value of the upstream object will trig- embedded in another object's text, a change in the em- 

ger the re-evaluation of affected statements. If the truth bedded value will affect the object's text and might 

value of an affected statement changes, affected objects 5 create a need to redisplay it Composite pointers for 

will be reevaluated. Objects which have already been Direct Object Links, consisting of the sequence number 

processed and whose firing behavior is unchanged by of linked object's controlling object, and the sequence 

the change in the value of the upstream object will number of the linked object, are also built and included 

simply be bypassed in the second pass of the consul ta- among an object's attributes if it has such links. When 

tion. If they previously fired and acquired values, these 10 the value of the object changes, the composite pointers 

values are accepted and the consultation moves on. In to linked objects are added to the Object Queue Array, 

certain situations, however, an object that previously Furthermore, the system flag 90 in the Object Values 

tired and that still fires on the second pass should be File 30 is set to force a retiring of the linkedto object 

retired anyway. For example, an object which imports when it is reached. 

data from an external file should be refired, for the 15 The design goals sought in the generation of the sys- 
change in the value of the upstream object conceivably tern and in the overall processing algorithm are to mini- 
could change the behavior of the import operation or mize generated data in the Generated Data File 14, and 
the value of the data that is imported. To handle these to minimize computation time at execution. The truth 
situations, a system flag 90 is set in the Object Values value of a statement should be evaluated only when it 
File 30 to signal that the object should be refired. 20 needs to be: when one of its inputs has changed value 
Cn tr 11' Oh' tc an( * *° e consu l tot ion has reached the statement* s con- 
n o g jects trolling object. Similarly, an object's relevance criteria 
The imposition of a sequence for objects introduces should be evaluated only if necessary: when the truth 
the notion of a "controlling object". A given object will value has changed for one of the statements on which it 
have references to other objects among its attributes, in 25 depends, and the consultation has reached the object's 
the form of statements, text embeds, lookup keys for controlling object, 
imported data, etc. These are its set of '"upstream" ob- 0 

jects: objects on which its behavior depends. The con- Statement and Object Queue Arrays 32, 34 

trolling object is that object in the set of upstream ob- In order to support the flagging of statements and 

jects with the highest system sequence number. 30 objects for re-evaluation, two arrays created at the time 

Suppose object X, in its relevance criteria, refers to of execution are used as queues to hold pointers to 
five upstream objects: A B C D and E. Among those flagged entities. One array is the Statement Queue 
five, the last to be considered, the one with the highest Array 34, which contains pointers to flagged state- 
sequence number is the controlling object for X'sevalu- ments, and the other is the Object Queue Array 32, 
ation: X cannot be evaluated properly until all five 35 which contains pointers to flagged objects, 
objects' values are known. Any prior attempt to evalu- Before leaving an object and going on to the next one 
ate X would be pointless, and would create additional, in the sequence, the Statement Queue Array must be 
wasted processing. Therefore, an evaluation of X must examined to determine if the current object is the con- 
wait until the controlling object is reached and evalu- trolling object for any statements in the queue. If so, 
ated. 40 those statements are to be reevaluated at this point. 

Statements have controlling objects as well A state- Similarly, before leaving an object, the Object Queue 

ment may refer to various objects, one of which will Array must be examined to determine if the current 

have the highest system sequence number. The state- object is the controlling object for any objects in the 

ment cannot be evaluated properly until all of these queue. If so, those objects are to be re-evaluated at this 

objects' values are known, ie. until its controlling ob- 45 point 

ject is processed. Upon acquiring a new value for an object, the ob- 

Therefore, among the attributes of an object is the list ject's list of all statements that use this object is exam- 

of pointers to statements which it can affect, and for ined. These statements are added to the Statement 

each such statement, the controlling object of the state- Queue Array using a composite pointer which identities 

ment is noted. For each statement that can be affected, 50 the statement's controlling object, and each will be 

a composite pointer is built consisting of the sequence evaluated when its controlling object is reached. In 

number of the statement's controlling object, and the some cases, that is the object currently being processed, 

identifier of the statement to be evaluated. This compos- For example, suppose a statement reads M {OBJEC- 

ite pointer is added to the Statement Queue Array (see T_ 7}>0". When OBJECT— 7 is processed and ac- 

below) when the object's value changes, in order to flag 55 quires a value, this statement is one of the statements 

the statement for re-evaluation. that can be affected by the value, so a pointer to the 

Each statement's data, in turn, contains a list of ob- statement should be added to the Statement Queue Ar- 

jects which use the statement, and the controlling ob- ray. Since the statement depends on OBJECT_7 only, 

ject of each of those objects is noted For each such OBJECT— 7 is also the statement's controlling object, 

object that can be affected, a composite pointer is built 60 so the statement should be evaluated before leaving 

consisting of the sequence number of object's control- OBJECT—7. The order of processing events is: 

ling object, and the sequence number of the object to be (1) Acquire a value for OBJECT_7 

evaluated. This composite pointer is added to the Ob- (2) Add a pointer for this statement to the Statement 

ject Queue Array (see below) when the statement's Queue Array 

truth value changes, in order to flag the object for re- 65 (3) Before leaving OBJECT—7, search the Statement 
evaluation. Queue Array to see if OBJECT—7 is the control- 
Finally, while statements can only affect objects, ling object for any statements. If so, evaluate those 
objects not only can affect statements but can also affect statements. 
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The Statement Queue Array holds pointers to state- For those objects, then, where the statement's new 
ments that have been flagged because of changes in truth value works at cross-purposes to the object's cur- 
value for referenced objects. For a statement in this rent firing status, a reference is added to the Object 
queue, the array holds a composite pointer containing Queue Array to flag the object for re-evaluation. This 
the sequence number of the statement's controlling 5 reference consists of the sequence number of the con- 
object, and the identifier of the statement When that trolling object for the object in question, and the se- 
controlling object is reached, the reference to this state- quence number of the object to be re-evaluated, 
ment is recognized and removed from the array. For A similar process occurs when the controlling object 
efficient searching of the array, controlling objects are is reached for an object referenced in the Object Queue 
listed only once, at the beginning of each data element 10 Array. The reference is recognized and removed from 
in the array. Following each unique controlling object the Object Queue Array, and the file address for the 
sequence number is a list of all statements (in the State- object to be evaluated is extracted from the Object 
ment Queue Array) or objects (in the Object Queue Address Control String 26. The file pointer is moved in 
Array) which have been flagged for re-evaluation when the Generated Data File to the noted address, which 
the controlling object has been processed. The array is 15 marks the beginning of the object* s data. This data is 
scanned to find the controlling object of interest If it is read into memory. Included in this data is the object* s 
found, its list of statements or objects is read into mem- set of references to relevance criteria statements, ex- 
ory for processing, and the data element containing the panded into standalone pathways, each of which if 
controlling object and its list is removed from the array satisfied is sufficient to cause the object to fire. Each 
(See FIG. 8). 20 statement in a pathway is referenced by its unique iden- 

The statement's identifier is used to find and extract drying number. The truth value of each statement is 

from the Statement Address Control String the address retrieved by e xaminin g the statement's character in the 

in the Generated Data File where the statement's data Statement Truth Control String. When a false statement 

begins. The file pointer in the Generated Data File is is encountered, evaluation of that pathway stops and an 

moved to that address, and the statement's data is read 25 evaluation of the next pathway commences. When all of 

into memory. This data set includes the line of code that a given pathway's statements are found to be true, the 

was generated for the statement by the Generator Fro- object fires and evaluation stops. If all pathways have 

gram 8, and also the list of objects that can be affected been examined without finding one that succeeded, the 

by a change in the statement's truth value. object does not fire. The Object Firing Control String is 

The generated code for the statement is extracted 30 then updated to reflect the new firing status of the ob- 

from this data. References to objects in a statement's ject, if the firing status changed, 

generated code take the form of an "O", followed by an In this two-tier approach to the inferencing strategy, 

underscore, followed by the object's unique identifier, in which the evaluation of statements and objects are 

i.e. "0^3649". For each such instance of an object separated and pointers are used to flag statements and 

reference, the current value of the object is retrieved 35 objects for reevaluation, we see that maximum possible 

from the Object Values File, and a temporary memory efficiency is attained, because: 

variable is created to hold the value. This variable has (1) Action in the system is initiated only in response 

the same name as the object reference in the generated to changes in the value of an object; 

code, Le. "O— 3649". When all such object references (2) In response to a change in an object's value: 

have thus been transformed into references to tempo- 40 (a) only those statements in the system that can be 

rary memory variables, the statement is evaluated for affected by the change are evaluated, 

truth. The resulting truth value is then compared to the (b) each such statement is evaluated only once, at the 

current truth value for the statement in question. point where its controlling object is reached and 

If the statement's truth value has changed, the new therefore all of its inputs are known; 

truth value is recorded, and the statement's list of af- 45 (3) In response to a change in a statement's value: 

fected objects is consulted. For each such object, the (a) only those objects whose current firing behavior 

object's firing status may be affected by the changed in the system can be affected by the change are 

truth value of the statement However, more efficiency evaluated, taking into account not only the object's 

can be obtained by noting that in some cases the dependence on the statement, but also a compari- 

changed truth value cannot affect the object's behavior, SO son of the statement's current truth value and the 

because of the firing bias. In the preferred embodiment, object's current firing status, 

the firing bias calls for firing an object only when it's (b) each such object is evaluated only once, at the 

relevance criteria statements evaluate to "true". There- point where its controlling object is reached and 

fore, the effect of a statement's truth value going from therefore all of its inputs are known, 

"false" to 'true" can only be to contribute toward caus- 55 „ , . _ _ 

ing the object in question to fire. But if the object is Knowledge Base to Natural Language Interpreter 6 
already marked for firing (because one of its other crite- Natural language versions of relevance and action 
ria pathways has already been satisfied, for example), criteria for each object in the system are highly useful 
the object's firing status cannot change and the point is for system documentation and development Such inter- 
moot In such a case, there is no point in adding a refer- 60 preted versions also provide an effective way to com- 
ence to the object to the Object Queue Array for re- municate the knowledge being encoded into the system 
evaluation. to interested parties outside the development process, 
Similarly, if a statement's truth value changes from who may be helpful in verifying the accuracy and ap- 
"true" to "false", it will contribute toward causing the propriateness of the expertise being captured, 
object not to fire. But if the object is already not firing, 65 The invention includes a Knowledge Base to Natural 
nothing can change, and again no purpose can be served Language Interpreter Program 6 which is a computer 
by adding it to the Object Queue Array to mark it for program that uses object translation attributes to create 
re-evaluation. natural language versions of relevance criteria state- 
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ments 12. For a given object, its relevance criteria are 
retrieved from the Objects Data File 40. Each state- 
ment's pointer is followed to the corresponding record 
in the Statements Data File 42, and the left-side and 
right-side 4-digit pointer elements of the 9-digit en- 5 
coded statement are extracted. These are looked up in 
the Expressions Data File 44, and the expressions are 
retrieved. The fifth digit in the statement's 9-digit repre- 
sentation identifies a particular operator symbol. These 
elements are combined as follows to form the statement: 10 

<left-side expression > <operator symbol > 
< right-side expression > 

This statement is then parsed to examine its elements. 15 
For each instance of an object reference in the state- 
ment, the referenced object is looked up in the Object 
Data File, and its translation attribute is extracted and 
substituted for the object reference in the expression. A 
standard natural language version is substituted for the 20 
operator symbol (for example, the phrase "is greater 
than or equal to" is substituted for the symbol "> =" in 
the statement). 

Other symbols that appear in the expressions used 
will also have natural language alternatives used in their 25 
place. For example, will become "plus", V will 
become "divided by", etc. Combinations of objects are 
also translated into a more natural representation. For 
example, "<object translation > + < another object's 
translation >** will become "the sum of < object trans- 30 
lation> and < another object's translation>". Paren- 
theses and logical operators remain unchanged Cer- 
tainty factors become "with a certainty factor of at least 
< certainty factor value >". 

The resulting interpretation of the relevance criteria 35 
is also processed to ensure that sentences begin with 
capital letters, and that appropriate punctuation and 
connecting articles are present. Preamble text is added, 
referencing the translation and the object type attribute 
of the object whose relevance criteria is being inter- 40 
preted. Standard text is also included to cover special 
situations, such as the case where no action criteria are 
specified for a possible value (i.e., it always will be 
assigned). 

Here is an example of the process and the resulting 45 
interpretation of relevance and action criteria for an 
object 
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From the Objects Data File: 
Attribate Value 
Object name: MOMTAPPROV 
Object type Conclusion 
Translation: Required management approval 

Relevance Criteria: 9182 4- pointers to Statements 

Possible Values: YES«0321_(0432.0343)|NO «- Data File 55 
From the Statements Data File: 

(encoded form, with pointers to the 
Statement ID Statement Eaprestocs Data File) 



9182 
0321 
0432 
0543 

From the 



975310987 

065430765 +- (left pointer: 0654 
087610987 (operator: 3 
123454321 (right pointer: 0765 
Data File: 



Expression ID Expression 



9753 {8163} 

0654 {0246} 

0765 25000 

0876 {0357} 

0987 YES 

1234 {0468} + {0680}/{M32} 



+- object references axe in curly 
brackets and use object ID 
numbers as pointers to the 
Objects Data File 
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4321 {6543} 
From the Objects Data File: 
Object ID: 8163 
Object name: STDTYPE 
Translation: Contract type is "Standard" 
Object ED: 0246 
Object name: CONTVALUE 
Translation: Contract value 
Object ID: 0357 
Object name: WARKANTINV 
Translation: Warranties are involved 
Object ID: 0468 
Object came: HISTPAJD 
Translation: Historical payments made 
Object ID: 0680 
Object name: PENDORDER 
Translation: Value of pending orders including the 
current one 

Object ID: 5432 
Object name: YRSHIST 

Translation: Number of years of this relationship 
Object ID: 6543 
Object name: AVEANNUAL 
Translation: Average annual order value for all 
customers 

Standard operator translations: 
Number Operator Translation 

1 = equal to 

2 # not equal to 

3 > greater than 

4 < less than 

5 >= greater than or equal to 

6 <=» less than or equal to 

7 $ contained in 

8 !$ not contained in 

The resulting natural language interpretation 12: 

The Conclusion object for required management approval 
will fire if: 

Contract type is "Standard" is equal to YES. 

A value of "YES** will be assigned to this object if: 
Contract value is greater than 25000 

AND either warranties are involved is equal to YES, 

OR the sum of historical payments made and the value of 
pending orders including the current one, when 
divided by the number of years of this relationship, is 
greater than or equal to the average annual order 
value for all customers. 

A value of "NO" will be assigned in all remaining cases. 



Explanation and Diagnostic Utility 7 

The invention includes an Explanation and Diagnos- 
tic Utility which allows the developer and/or the user 
to examine the reasoning processes at work in the sys- 
tem. This utility is part of both the Knowledge Base 
Development Program 4 and the Object Processing 
Program 9. A request for an explanation is usually ask- 
ing one of the following questions: 

Why is this question being asked? 

Why is this question or conclusion relevant? 

How was that conclusion drawn? 

Why is this message or output being asserted as true? 

In each case, what is being requested is an explanation 
for the behavior of a particular object, Le. why did the 
object fire. A less common request, but one which can 
be quite ittunainating for a user, is: "why did this object 
NOT fire? " Very few expert systems have the capabil- 
ity to explain why events did not happen. 

To explain to the user the reasoning used in the analy- 
sis to arrive at a certain result, the Explanation and 
Diagnostic Utility looks to the relevance criteria of the 
objects involved, and constructs a natural language 
explanation 13 based on the relevance criteria state- 
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ments found. Each object in the knowledge base has a 
set of relevance criteria statements (except for objects 
that always fire, which are trivial to explain), and this 
set of statements constitutes a complete enumeration of 
how the object should behave. Therefore, the explana- 5 
tion process for an object that fired is simply to examine 
its relevance criteria, determine which statements eval- 
uated to "true", identify the pathway(s) that succeeded, 
and construct the explanation using those statements. In 
the case of explaining why an object did not fire, each 
pathway is examined and the statexnent(s) which evalu- 
ated to "false" are identified and presented as require- 
ments that were not satisfied, collectively causing each 
of the possible pathways to fail. 15 

When a request for explanation is initiated by the 
user, the system retrieves the data for the object from 
the Generated Data File 14. The alternative pathways 
for firing the object are extracted from this data, and 
each statement within each pathway is evaluated for 20 
truth by examining its character in the Statement Truth 
Control String 22. Successful pathways are identified, 
as are the statements which will be used in the explana- 
tion. The data for each such statement is then retrieved 
and its generated code is extracted. A natural language 23 
version of the statement is created by substituting object 
translations for object references within the code state- 
ment, by using natural language versions of the opera- 
tors used, and by presenting the current values of refer- 
enced objects to support the truth values of the state- 30 
ments. 

For example, suppose an object fires and asks a ques- 
tion of the user, and the user requests an explanation for 
why this question is being asked. The object's data in- 35 
eludes its relevance criteria statements. Suppose there 
are two alternative pathways, each consisting of a single 
statement The truth values of the two statements are 
looked up in the Statement Truth Control String, and it 
is determined that the object's second pathway caused 40 
the object to fire because only the second statement 
evaluates to *true". The second statement's address in 
the Generated Data File is ascertained by consulting the 
Statement Address Control String 28, and the state- 
ment's data is retrieved. Suppose that the statement's 43 
generated code reads as follows; 

O_0245+O_3299> = 100,000 

The current values for the two objects referenced are 50 
retrieved from the Object Values File 30. That file also 
records the sequence number of each object. These 
sequence numbers allow the object's address in the 
Generated Data File to be identified by consulting the 
Object Address Control String 26. Each object's data is 53 
read into memory and its translation is ascertained. 
Suppose the translation for the object with the identifier 
"0245" is "the number of units manufactured last year", 
and the translation for object "3299" is **the number of ^ 
units ordered this year", and that their current values 
are 65,000 units and 40,000 units, respectively. The 
resulting explanation would be presented to the user on 
screen, as follows: 

This question is asked because the number of units 55 
manufactured last year plus the number of units 
ordered this year is greater than or equal to 
100,000. 
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The number of units manufactured last year: 65,000 
The number of units ordered this year: 40,000 
Object translations in the explanation text are high- 
lighted in a different color, so that the user knows these 
represent other objects in the system. Each such high- 
lighted text region behaves like a menu option. If the 
user selects "The number of units manufactured last 
year", the explanation process is repeated for object 
"0245", and an explanation for that object will appear 
on screen. This allows the user to follow the system's 
reasoning backward, to see what logic applied and to 
check the values of the objects involved. 

The Explanation and Diagnostic Utility is also used at 
the development level in order to diagnose and debug 
object firing behavior during development. Developers 
can evaluate the behavior of any object, and by select- 
ing displayed translations as menu options, can follow 
links between objects, navigating inter-object depen- 
dency relationships at will 

Knowledge Base Diagramming Utility 16 

As a supporting program for the listing of the knowl- 
edge base for documentation purposes, a program pro- 
duces a diagram of all inter-object dependency relation- 
ships as a pictorial form of documentation 17. The pro- 
gram examines all attributes of each object in the de- 
fined sequence to identify all objects on which it de- 
pends. A file is produced showing all objects, with 
connecting lines drawn between objects to represent all 
dependency relationships. 

Example of an Application 

For illustration purposes, here is a trivial expert sys- 
tem to evaluate the advisability of buying or selling 
stock. If the stock is currently owned, the system will 
evaluate the historical cost of the stock owned, and 
compare it with the current price per share. It will 
recommend selling the stock if a gain can be realized on 
the sale, else it will recommend holding the stock. If the 
stock is not currently owned, the system will look to see 
how much cash is available in the investment account, 
and report how many shares of the stock could be pur- 
chased with available funds. For simplicity of illustra- 
tion, the possibility of buying additional shares of a 
stock that is currently owned is ignored, and commis- 
sions on the transactions involved are also ignored. 

Each object can have numerous possible attributes. In 
this example, only those attributes of immediate interest 
are shown, for clarity of presentation. 

The knowledge base will require an application re- 
cord to hold application-wide and module-specific attri- 
butes: 



Attributes 


Values 


Application number: 


1 


Application name: 


Simple Stock Advisor 


Module number: 


1 


Module name: 


Simple Stock Advisor 



In this simple application, we are calling the applica- 
tion as a whole Application 1. It has one module, which 
is given the number 1. In an application that had more 
than one module, each module would be given an indi- 
vidual name, and each such name would likely differ 
from that of the application as a whole. 

Now the following objects are created: 
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Attributes 

Name: 

Type 

Sequence: 

Answer length: 

Text 

Criteria: 

Valid: 

Error: 

Name: 

Type: 

Sequence: 

Import: 

Criteria: 

Name: 

Type: 

Sequence: 

Values: 

Criteria: 

Name: 

Type 

Sequence: 

Text: 



Criteria: 
Name: 
Type 
Sequence 
Values: 
Criteria: 
Name 
Type 
Sequence 
Import: 
Criteria: 
Name: 
Type: 
Sequence 
Values: 
Criteria: 
Name 
Type 
Sequence 
Import: 
Criteria: 
Name 
Type 
Sequence 
Values: 
Criteria: 
Name 
Type 
Sequence 
Import: 
Criteria: 
Name: 
Type 
Sequence 
Values: 
Criteria: 
Name 
Type 



Values: 

Criteria: 

Name 

Type 

Sequence 

Values: 

Criteria: 

Name 

Type: 

Sequence 

Values: 

Criteria: 

Name 

Type 

Sequence 

Values: 

Criteria: 

Name 



Values 

STOCKNAME 
Screen, user input 
1 

25 

Enter the name of the stock: 
<none> 

LEN(ALLTRTM({STOCKNAME})) > 0 

You must provide the name of a stock here. 

PRICENOW 

Conclusion, import 

2 

MARKET.DBFA1KTNAMES.IDX/{STOCKNAME}/PRICE 

<none> 

VALEDSTOCK 

Conclusion 

3 

YES»{PRICENOW} > 0 j NO 

<none> 

NOSTOCKMSG 

Screen, message 

4 

I could not find a stock listed under the name 

{STOCKNAME}. Please be sure you have the 

correct name for your stock of interest 
{VAUDSTOCK} = NO 
STARTOVER 
System 
5 

Reset 

{VAUDSTOCK} - NO 
OWNED SHARES 
Conclusion, import 
6 

PRTFOLIO i^BF/STKN AME. IDX/{STOCKN AME}/SHAR£S 

<none> 

OWNSTOCK 

Conclusion 

7 

YES»{OWNEDSHARES} > 0 | NO 

<a one> 

COSTPERSHR 

Conclusion, import 
8 

PRTFOUO.DBF/STKNAME.IDX/{STOCKNAME}/COST_SHR 

{OWNSTOCK} = YES 

TOTALCOST 

Conclusion 

9 

{COSTPERSHR} * {OWNEDSHARES} 

{OWNSTOCK} = YES 

CASHAVABL 

Conclusion, import 

10 

INVEST.DBF/FACTORS.IDX/"Cash AvafiableVAMOUNT 

{OWNSTOCK} = NO 

SALEVALUE 

Conclusion 

11 

{PRICENOW} * {OWNEDSHARES} 

{OWNSTOCK} « YES 

BUYABLE SHARES 

Conclusion 

12 

INT({CASHAVAIL}/{PRICENOW}) 

{OWNSTOCK} = NO 

BUYVALUE 

Conclusion 

13 

{PRICENOW} * {BUYABLE—SHARES} 

{OWNSTOCK} = NO 

GAIN 

Conclusion 

14 

{SALEVALUE} - {TOTALCOST} 

{OWNSTOCK} = YES 

OWN_RECOMMEND 

Conclusion 

15 

SELLB{GAIN} > 0 | HOLDB{GAIN} < « 0 
{OWNSTOCK} = YES 
NEW_RECOMMEND 
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Attributes 

Type: 

Sequencer 

Values: 

Criteria: 

Name: 

Type: ' 

Sequence: 

Values: 

Criteria: 

Name: 

Type: 

Sequence: 

Text: 



Criteria: 

Name: 

Type: 

Sequence: 

Text: 



Criteria: 

Name: 

Type: 

Sequence: 

Text: 



Criteria: 

Name: 

Type: 

Sequence: 

Text: 



Criteria: 



-continued 



Values 

Conclusion 
16 

BUY»{ BUY ABLE SHARES} > 0 I DO NOTHING 

{OWNSTOCK} = NO 

RECOMMENDATION 

Conclusion 

17 

{OWN__RECOMMEND}»{OWNSTOCK} « YES | 

{NEW_RECOMMEND} 
<sone> 
BUYMSG 

Screen (message), Output 
18 

The stock being considered is: {STOCKNAME} 

I recommend that you buy {BUYABLE— SHARES} 
shares of this stock at the current market 
price of {PRICENOW} per share. The total 
value of this transaction would be 
${BUYVA3LUE}. 

{RECOMMENDATION} = BUY 

SELLMSG 

Screen (message), Output 
19 

The stock being considered is: {STOCKNAME} 

I recommend that you sell the {OWNED_SHARES} 
shares of this stock that yon now own, at the 
current market price of {PRICENOW} per share. 
The total proceeds from this transaction 
would be ${SALEVALUE}. Your cost for these 
shares is ${TOTALCOST}, and your resulting 
gain is therefore S{GAIN}. 

{RECOMMENDATION} = SELL 

HOLDMSG 

Screen (message), Output 
20 

The stock being considered is; {STOCKNAME} 

I recommend that you hold the {OWNED— SHARES} 
shares of this stock that yon now own. The 
current market price of the stock is 
{PRICENOW} per share. Your average cost for 
these shares is StCOSTPERSHR} per share, and 
if you sold your shares the resulting loss 
would therefore be ${GAIN}. 

{RECOMMENDATION} « HOLD 

NOTHTNGMSG 

Screen (message), Output 

21 

The stock being considered is: {STOCKNAME} 
I recommend that you do nothing with respect 
to this stock. You do not own any shares at 
this time, and there is no cash available in 
your investment account to enable a purchase 
of shares. 

{RECOMMENDATION} = DO NOTHING 



Narrative description of system operation 

When the system is run, the first object, STOCK- 
NAME, is processed. There are no relevance criteria 
specified for this object, so it always fires. It asks the 
user to enter the name of the stock of interest, allowing 
25 characters for the stock name. A validity test is in- 
cluded as an attribute of the object which requires that 
the stock name not be blank. The test applies the ALL- 
TRIMO function to strip leading and trailing spaces 
from the character string, and tests to see that the re- 
maining string length is greater than zero. If not, the 
entire 25 characters were left blank and the error mes- 
sage is displayed, followed by another opportunity to 
enter a stock name (pressing the Escape key would 
terminate the consultation). 

When a name is entered, the consultation proceeds to 
the next object, PRICENOW. This is an import that 
looks to an external data file of all stocks (MAR- 
KET.DBF), which is indexed by stock name 
(MKTNAMES.XDX). No action criteria are assigned to 



50 

the import, so it is always attempted. The value of the 
STOCKNAME object is used as an index key and the 
system will try to locate a record for the stock of inter- 
est (Curly brackets ("{}") always indicate a reference 

55 to an object) 

If the stock is found in the file, the value of the re- 
cord's PRICE field of the MARKET.DBF database 
will be returned, to become the value of the PRICE- 
NOW object If the stock is not found, the import fails, 

60 and the value returned will be zero. 

The next object, VALIDSTOCK, uses the value of 
the PRICENOW object to determine whether the stock 
name entered exists. If the value is greater than zero, the 
import must have succeeded, and a value of "YES" is 

65 assigned to the VALIDSTOCK object. Note the for- 
mat of the Values attribute for this object. Possible 
values are separated by the character * 4 |" Within a 
value, action criteria for that particular value follow 
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the value itself, separated by the character *W 
(ASCII 254). 

Values are evaluated in the order specified in the 
attribute. If the value of the PRICENOW object is zero, 
then the import failed. When considering the first value 5 
for VALIDSTOCK, the action criteria statement for 
the YES value ({PRICENOW} >0) will evaluate to 
false, and therefore the YES value will be rejected. The 
next value in the series, NO, has no associated action 
criteria, and therefore it is always assigned. Note that 10 
the sequential evaluation of candidate values in effect 
implies action criteria for the NO value. If the action 
criteria for the earlier value are satisfied, the NO value 
is never considered. If it is considered, then all earlier 
criteria have failed, and this value covers all remaining IS 
cases. 

No relevance criteria are specified for the PRICE- 
NOW and VALIDSTOCK objects, for we always 
want to fire them in order to ensure that we are dealing 
with a valid stock. 20 

The NOSTOCKMSG object will fire only if its rele- 
vance criteria are satisfied, Le. only if the value of the 
VALIDSTOCK object is equal to NO. If VALID. 
STOCK=YES, then this object will simply be by- 
passed as inapplicable. If it fires, it displays its text as a 25 
message on the screen, and asks that the user press any 
key to continue after reading the message. 

Note the embedded value of the STOCKNAME 
object in the text of this screen message object When 
formatting the text for this display, the system will 30 
evaluate this embedded object reference and will substi- 
tute the user-entered stock name for the reference. The 
first line of the message might therefore read: 

I could not find a stock listed under the name: ACME 
PRODUCTS. 35 

Note also that, except where such an embedded ob- 
ject reference is substituted, all text in the text attribute 
will be displayed as formatted, including the period 
after the embedded reference. 

If VALIDSTOCK = NO, then the next object, 40 
STARTOVER, will fire, else it will be ignored. This is 
a system object to initiate the "reset** action, which will 
release all object values and start the consultation over 
again from the beginning. 

Note here the importance of the fact that all objects 45 
are processed in a predetermined sequence. The re- 
maining objects in the consultation are processed only if 
the consultation successfully passes the STARTOVER 
object: all subsequent objects may therefore assume that 
a valid stock name is being considered. The practical 50 
effect of this assumption is that no further references to 
the value of the VALIDSTOCK object are required in 
the relevance criteria of following objects. For exam- 
ple, the next object in the sequence, OWNED- 
SHARES, has no relevance criteria and will therefore 55 
always fire. A reasonable criteria statement for firing 
this and all subsequent objects might be "{VALID- 
STOCK} = YES", but this is represented implicitly by 
the fact that the object is being processed at all. 

The OWNEDSHARES object imports the number 60 
of shares of the stock currently owned from the external 
data file PRTFOLIO.DBF. This file is also indexed by 
stock name, and the value of STOCKNAME is again 
used to locate the appropriate record. The value of the 
SHARES field of the PRTFOUO data file is returned 65 
to become the value of the OWNEDSHARES object. 
If the stock cannot be located in the file, the import fails 
and the value of this object is zero. 
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The OWNSTOCK object that is processed next de- 
termines whether the stock is currently owned, based 
on the value obtained by the OWNEDSHARES im- 
port This object now becomes a watershed in the anal- 
ysis, for the behavior of the remainder of the analysis 
will be affected by whether the stock is currently 
owned. Thus the relevance criteria of following objects 
will make frequent references to this object 

Note that this object in effect serves some of the 
functions of a node in a decision tree, causing the logic 
of the system to branch. If the logic at work here were 
to be represented as a flow chart or a decision tree, the 
analysis would branch one direction or the'other, based 
on whether stock was in fact owned. This can be re- 
ferred to as "pruning" the decision tree (i.e. rejecting 
from the analysis the paths not taken), thus reducing the 
search space of the analysis. 

The next object, COSTPERSHR, performs another 
import to retrieve the historical cost per share of the 
shares owned. This object will fire only if the stock is in 
fact owned. 

The TOTALCOST object that follows is a simple 
calculation of the total cost of the owned stock, ob- 
tained by multiplying the number of shares owned by 
the cost per share. 

CASHAVAIL is an import from another external 
file, INVEST .DBF. This file, for purposes of illustra- 
tion, is a hypothetical file which contains information of 
interest to the investor, including the amount of cash 
available for purchasing new stock. This object fires if 
the stock of interest is not currently owned: the system 
is checking to see if funds are available to purchase the 
stock. The INVEST file is a data file indexed on various 
descriptive phrases which might be factors of interest to 
the investor. The lookup index key in this case is the 
phrase "Cash Available", and the value of the 
AMOUNT field is returned. 

The next object, SALEVALUE, computes the pro- 
ceeds resulting from the sale of owned stock. It multi- 
plies the current market price by the number of shares 
currently owned. It fires only if stock is in fact currently 
owned. 

The BUYABLE-SHARES object divides the cur- 
rent market price per share of the stock into the avail- 
able cash to calculate how many shares of the stock 
could be purchased. The INTO function is used to re- 
turn just the integer portion of this calculation, to avoid 
fractional shares. This calculation is only performed if 
the stock is not currently owned. The next object, 
BUYVALUE, computes the total cost of such a pur- 
chase, which may be something less than the cash avail- 
able if the BUYABLE— SHARES calculation resulted 
in fractional shares. 

The next object, GAIN, computes the difference 
between the potential total sale value of owned stock 
and the total historical cost of the stock, to yield a gain 
(or loss, if negative) on a sale. This calculation of course 
only occurs if stock is owned. 

Two conclusion objects now follow, OWN REC- 
OMMEND and NEW— RECOMMEND. If stock is 
currently owned, the OWN-RECOMMEND object 
looks to the GAIN object to determine if a sale of the 
stock would result in a gain or a loss, and recommends 
SELL or HOLD accordingly. If the stock is not cur- 
rently owned, the NEW— RECOMMEND object 
checks the value of the BUYABLE-SHARES object 
to see if it is possible to buy any shares at this time. If so, 
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it recommends a BUY, else it advises the investor to DO 
NOTHING. 

A following conclusion object, RECOMMENDA- 
TION, collects the various possible recommendations 
of the system in a single object, to which later objects 5 
may conveniently point. Alternatively, these three ob- 
jects could have been handled in a single RECOM- 
MENDATION object which would cover all of the 
possibilities at once with somewhat more complex ac- 
tion criteria for the possible values: 10 
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(3) Modules calling other modules 

Support is also provided for systems to call other 
expert systems as modules to perform sub-tasks, creat- 
ing new values or updating the values of objects that 
have already been processed, whereupon program con- 
trol returns to the calling module. In the current imple- 
mentation, the system being called must be another 
module of the same application, because object names 
and identifiers may be duplicated in different applica- 
tions, but are not allowed to be duplicated within mod- 



Values: SELL«{OWNSTOCK = YES} AND {GAIN} > 0 | HOLD«{OWNSTOCK = 

YES} AND {GAIN} < = 0 | BUY»{OWNSTOCK = NO} AND 
{BUYABLE—SHAKES} > 0 | DO NOTHING 



Finally, each of the four possible recommendations of 
the system has an associated message advising the user 
what to do. Various object values are embedded in 
these messages, in order to present appropriate informa- 
tion about the circumstances and possibilities. Because 
of their mutually exclusive relevance criteria, one and 
only one of these message alternatives will in feet fire. 

These message objects are also designated as Output 
objects. The message object that fires will have its for- 
matted text preserved, complete with embedded values, 
and this text can be stored to a file or printed as a page 
of results. Alternatively, the object's system identifier 
can be written to an external file, in order to identify the 
output of the consultation. Another system might use 
this result as one of its inputs. Program Control Varia- 
tions 

Many problems can be addressed simply by starting 
at the beginning of the object sequence and processing 
all objects until the end of the sequence is reached, 
whereupon the system terminates its execution. Some 
problems, however, require different approaches to 
program control. These are discussed below. 

(1) Repetitive or "Batch" operations 

Batch jobs, for example, are supported by providing 
a system reset capability and putting the system in a 
loop for repeated executions. In this arrangement, the 
system typically imports a set of data and operates on it 
by processing the full set of objects, taking appropriate 
action with the data and the results of the analysis. The 
system then resets itself back to its initial state, imports 
the next batch of data, and repeats the process until 
some terminating condition is satisfied to allow an exit 
from the loop. 

(2) Progressive Refinement 

In a different type of cyclical control structure, the 
system performs successive passes on the same set of 
data, refining it each time, until some terminating condi- 
tion is reached. The first pass through the object list 
allows a high-level or "coarse-grained" manipulation of 
the data. Each subsequent pass takes the results of the 
previous pass as its input, performing more "fine- 
grained" manipulation. 

Such cycles may also be built into a segment of what 
is otherwise a "single-pass" set of objects. This is sup- 
ported by system "go-to" objects which, in the absence 
of a specified terminating condition (which might be a 
logical value, end-of-file condition in some external data 
file, or a specified number of iterations), reset the object 
counter back to some earlier object in the Object Firing 
Control String 20. This creates in effect a "do-while" 
looping construct within the otherwise linear set of 
objects. 



ules of the same application. 

A system object in the calling module performs the 
call to the other system. In the process, the current state 
of the calling module is saved, by saving to a file on disk 
important memory variables, including the current ob- 
ject counter 112, the Statement and Object Queue Ar- 
rays 34, 32 and the Control Strings 20-26. The Gener- 
ated Data File 14 for the calling module is closed, but 
the Object Values File 30 remains open and unchanged 
for use by the called module. However, a copy is cre- 
ated of the Object Values File as it exists at the time of 
the call, for use when program control returns. Process- 
ing then begins for the called module, which opens its 
own Generated Data File and proceeds with its consul- 
tation 15. The called module may in turn call another 
module, and there is no limitation on the degree to 
which such inter-module calls may be nested. 

Objects are reused within an application by being 
members of more than one module. If an object that is 
common to the two modules has been processed by the 
calling module, and is therefore found by the called 
module in the existing Object Values File, its value is 
accepted and used in the called module's consultation. 
The value of a common object may also be updated as 
a result of the called module's analysis. Note that almost 
all attributes of a common object may differ in the two 
modules. In particular, the firing behavior of the object 
may differ, because different relevance criteria and 
inter-object dependency relationships may be specified 
in the called module's data. Other attribute data of com- 
mon objects that is stored in the Object Values File will 
be updated according to the attribute specifications of 
the called module, as part of its normal processing. 

For example, suppose an object is a screen object in 
the calling module, and it fires and acquires the value 
**XYZ" In the called module, this same object might be 
a conclusion object If it fires in the called module, its 
initial value in that context is still "XYZ", but the alter- 
native values for the object as a conclusion will be pro- 
cessed, perhaps changing its value to "ABC. The 
called module will update the Object Values File to 
reflect this new value, and will also update the object 
type attribute in that file to note that this object is a 
conclusion. 

After processing is complete in the called module, 
program control returns to the calling module. At that 
time the calling module's Generated Data File is re- 
opened, and the Statement and Object Queue Arrays, 
Control Strings and other memory variables for the 
calling module are reinstated from the file saved to disk. 
A comparison is then made between the Object Values 
File and the copy of the file that was made at the time 
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of the call. The values of objects in the file are not Many, perhaps most, expert system applications do 

disturbed, for typically the reason for calling the mod- not require certainty factors. When solving problems, 

ule in the first place was to acquire different values for experts use them in some situations to evaluate degrees 

one or more common objects as a result of the called of belief or likelihood. Most of the time, however, the 

module's processing. Other attribute data that is stored 5 algorithms involved in combining certainty factors are 

in the Object Values File, however, is restored to its too complex for an expert to use in practice, and the 

original state, to match the attribute specifications of the mere fact that uncertainty exists becomes an additional 

calling module. In the example just given, the value of f ac t that is evaluated with 100% confidence together 

the common object upon return to the calling program with other facts in the analysis. Also, expert systems are 

will be its updated value, "ABC". The object type attri- 10 frequently designed not to evaluate and deal with un- 

bute stored in the Object Values File, however, is certainty, but to eliminate it. Thus an expert system 

changed back to reflect the fact that this object is a application in a business setting may enforce company 

screen object in the calhng module, rather than a con- policies predictably and uniformly by encoding the 

elusion. . requirement that "if this condition of uncertainty exists, 

Processing then resumes normaUy m the calhng mod- 15 then always do the following". 

ifcTte Object Values File will now contain updated ^ revea3s „ important distinction when thinking 

antw^ *Sr°^ ^ 

T ~JZ~ cJT t7 j icwiua iuiuujcuia luai certainty can be represented directly m a system and 

are members of the called module only. These extra j , A .X ■ . . . J 

object records are ignored in the subseq.Lt processing 20 i^Zfr^ T ^JS^ Z*? 

of the calling module and do not affect its analysis I 1 * Be »«? d h / «°f°^8 

any way knowledge as if it were certain and then adjusting the 

When'another module is called in this manner and "^f 8 ** about * e a PP B " 

updates the value of objects common to both modules in cabihty of the encoded knowledge. 

theObject Values File, followed by a return of control 25 / or exam ?* -""P"** «TTieretta60% 

to the calling module, the practical effect is a "black- f™™ tomorrow", asserted with a 100% confi- 

board" system of cooperating expert systems which dence level Wiethe assertion 'It w^ rain tomorroV', 

share knowledge through the values of the common mcrted mih a m ° confideQce leveL ^ ^ assertion 

objects, represents uncertainty directly, and there is no ambigu- 

(4) Event-driven applications 30 about elements in the system may use it and 

Monitoring, reactive, and other event-driven applica- reason about ix ^ m confidence that it applies. The 

tions are supported by establishing conceptual second assertl011 represents uncertainty indirectly, by 

"threads" in the overall list of objects, based on inter- making an unambiguous, statement, but the assertion is 

object dependency relationships. A thread is a continu- qualified with a certainty factor to indicate that it may 

ous sequence of objects that forms a conceptually co- 35 not ^ con "ect. 

herent subset of the overall object list, and constitutes a ^ invention supports both alternatives for repre- 
section of the Object Firing Control String 20. An ob- senting uncertainty. The first method relies on the in- 
ject at the beginning of a thread is called the "thread herent meaning of an object to represent uncertainty, 
initiator" object, and the final object in the thread is the wnile *h e object under specified conditions with 
"thread terminator'' object 40 100 ^> confidence that the object should be fired. A 
An external monitoring routine waits for certain con- conclusion object may fire and conclude that "X is 
ditions to arise, then resets the object counter to the uncertain", and the system can then use that fact in its 
initiator object for the appropriate thread. The system analysis. Knowledge about what the system should do if 
then processes objects normally for the length of the object is applicable, Le. under conditions where 
thread, observing object sequence, firing applicable 45 "X" & uncertain, is encoded in the relevance and action 
objects and taking appropriate actions, and after pro- criteria of other objects, whose behavior may be af- 
cessing the terminator object of the thread, returns fected by this condition. 

control to the monitoring routine. Such an design, for The second method for representing uncertainty em- 
example, might be appropriate for monitoring pressure ploys certainty factors directly, in several contexts of 
valves in a hydraulic system. Threads could be allowed 50 tne knowledge base. Certainty factors can be assigned 
to overlap, where objects are common to more than one directly (e.g., "80%"), or may be derived by evaluating 
thread. Such an arrangement is possible because objects an expression. 

are ordered and processed according to their inter- 0) A certainty factor can be assigned as an attribute 

object dependency relationships, as determined by their of an object, and when the object fires and acquires a 

relevance criteria. 55 value, this certainty factor is stored in the Object Values 

File 30. 

Certainty Factors (2) Certainty factors can be assigned to the values 
Certainty factors qualify the applicability of criteria that conclusion objects may acquire, to say in effect: 
and object values by providing a measure of likelihood ''under these conditions, fire this object and give the 
or confidence that a given condition is true, thereby 60 resulting value the following certainty factor". This 
supporting to some extent reasoning under conditions of allows different certainty factors to be assigned to dif- 
uncertainty. A system supporting the use of certainty fcrent alternative values of the same object, 
factors, for example, might conclude a given faefs If a given value should be concluded with a different 
value with an 80% level of confidence, and another certainty factor under different conditions, the value is 
fact's value with only a 50% level of confidence. Sev- 65 repeated as a separate alternative value for the object, 
eral algorithms exist for combining certainty factors with its own dedicated set of action criteria. The result- 
when such factors interact, to produce a new, compos- ing effect is to say: "under these conditions, conclude 
ite certainty factor for a resulting value. this value for the object with a 90% confidence level, 
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but under these conditions conclude that same value 
with only a 60% confidence level". 

(3) Criteria statements can acquire certainty factors as 
a result of the combined certainty factors of objects 
referenced in them, and each statement in a set of crite- 5 
ria can be assigned a certainty factor threshold, to say in 
effect: "in order to evaluate to true, this statement must 
have a certainty factor of at least 75%**. The same state- 
ment could be re-used in the context of another object's 
criteria, using a different certainty factor threshold. 10 

(4) A statement may make an explicit evaluation of an 
object's certainty factor by employing the CF() func- 
tion, a function in the Object Processing Program, 
which returns an object's certainty factor, as recorded 

in the Object Values File. For example, a statement 15 
might read: 

CF (OBJECT_12)>.8 

This statement would evaluate to "true" if the cer- 20 
tainty factor of OBJECT—12 is greater than 80%. 

Validation Table Generator 

Inter-object dependency relationships also allow a 
means of verifying and demonstrating that an expert ^5 
system is behaving correctly, i.e. consistently with its 
defined relevance criteria. Because the criteria repre- 
sent and provide for all factors which can affect an 
object, and since the firing behavior of objects can be 
measured, validation tables and test cases can be con- 30 
structed to monitor and prove the behavior of each 
object in the system. 

For this purpose, the invention includes a Validation 
Table Generator 5. In a validation table created by this 
program 11, the table itself tests one enabling pathway 
of an object Each of the factors that can affect the 
behavior of the object are listed down the left side of the 
table (see FIG. 12). Each column of the test table con- 
tains a fact situation involving these factors, created by ^ 
a particular test case, which is given a test case number 
at the top of the column. The table is divided into a 
left-hand portion which tests the failure of the object to 
fire, and a single, final column on the right-hand side 
which tests the firing of the object In each of the left- 45 
hand columns, all factors of the pathway should be 
enabled except for one (a different factor in each col- 
umn should be disabled). Since all of the factors repre- 
sented within a pathway for an object must be satisfied 
in order for the pathway to succeed, in the test cases for 
each of these columns, the object should not fire, and it 
should not fire because one and only one of the factors 
is disabled. In the final column of the table, all factors 
should be enabled, and the object should fire. 

Since a given test case will involve the processing of 55 
all objects in the system, and only a few objects will 
affect a given pathway, test cases may be re-used from 
table to table. The result of creating tables for every 
pathway of every object in the system will be a mini- 
mized collection of test case scripts which can be fed 60 
into the system, and the firing behavior of objects can 
then be checked against the behavior predicted by the 
test tables. In this fashion, mistakes in the specification 
of criteria can be checked for, and the correct operation 
of the system can be demonstrated empirically. Because 65 
the relevance criteria are stored in the knowledge base, 
the creation of test tables and test cases, and the process- 
ing of the test case scripts, can all be automated. 
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Alternative Embodiments 

Three alternative embodiments of the invention have 
been devised, in addition to the preferred embodiment 
disclosed above. These are alternatives for the organiza- 
tion and implementation of the system which is gener- 
ated from the knowledge base, and which actually per- 
forms the expert system's function. Each is described 
below. 

(1) Code generation 

A conventional computer program is generated, in 
which statements of computer code are generated for 
each object in the knowledge base, in the' order of the 
defined object sequence. The main body of the program 
consists of the objects' relevance criteria, separated into 
distinct enabling pathways, and constructed as a series 
of nested "if* statements built from the criteria state- 
ments. Thus if a pathway for an object consisted of 
three statements (A, B and C), then the generated code 
would read: 



IF < statement A> 
IF < statement B> 
IF < statement C> 

< statements to fire the object > 
ENDIF 
ENDIF 
ENDIF 



If a particular "if" statement fails, program control 
will drop to the end of this block of nested statements, 
bypassing further evaluation of statements in that path- 
way. 

Object values, when acquired, are stored in memory 
variables using the object name as the name of the vari- 
able. Lines of generated "if" statements use these same 
names to refer to the object's value. The result is that 
each "if" statement is directly executable, because mem- 
ory variables exist for each object statement All mem- 
ory variables to hold object values are created and 
initialized at the start of processing, in order to provide 
for references to objects that do not fire. 

The data type of each object's expected value must be 
considered when generating this code and when initial- 
izing object memory variables. Variables for objects 
containing character data are initialized to the null 
string ("")> and generated code for character string 
comparisons must embed quotation marks correctly. 
For example, a line of code might be generated as: 

IF OBJECT_W= M YES M 

A memory variable named OBJECT 19 will be cre- 
ated at system startup and will be initialized to the null 
string. Note also the quotation marks around the literal 
string YES. 

Similarly, variables for objects of numeric data type 
are initialized to zero, logical data types initialized to 
false, and date types initialized to a null date (" / / "). 

Because code is generated in the order of object se- 
quence, references to other objects' values in a given 
object's generated code will be meaningful. The refer- 
enced objects will have been processed, and their values 
acquired, prior to the program's execution of code 
which references their values. Default values may be 
assigned either initially, or upon passing the object's 
generated code. If the set of nested "if statements all 
succeed* then program control will reach the code gen- 
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crated at the heart of the code block. These statements ways and by statement position within pathways A 
will initiate appropriate action for the object's firing. If composite index key is built, holding the object se- 
ason criteria are mvolved for tie object— for ex quence number, criteria path number, and statement 
criteria governing the importation of data, the appear- position number within each pathway, 
ance of a given option on a menu, or a conclusion's 5 When driving through the master file, the beginning 
value— appropriate subroutines containing similar gen- of a new object is evidenced by a change in the object 
crated code are called. If the object is a screen object, a sequence number. Consideration begins of the object's 
general purpose procedure is called to handle the screen first criteria pathway. Each statement record is exam- 
interface and acquire a value for the object. Similarly, ined in turn, until a new pathway number is encoun- 
general-purpose subroutines are called to handle the 10 tered If a statement evaluates to "false" records in the 
concluding of values for internal conclusion objects, the file are skipped until a new pathway or new object is 
posting of messages to the user, and the writing out of encountered. Each statement record contains a state- 
data to an external file. ment identifier. A relation is set using this identifier into 

Code is also generated to handle requests by a user to another file (the "resource*' file), indexed by statement 
interrupt processing, and to handle other common 15 identifier, in which the generated code for each state- 
events in the processing of the system. ment is stored. As each statement in the master file is 

In practice, this implementation can be extremely evaluated, the line of generated code for the statement is 
fast, but it also encounters several difficulties. The retrieved from the resource file and evaluated for truth, 
amount of generated code for a large system can be Object values are stored in an Object Values Data file, 
exceptionally large, and may be too large to fit in a 20 and are retrieved and used as needed in order to evalu- 
processor's available memory, causing system crashes. ate statements. 

This is largely because no reuse of statement code is A relation is also set from the master file into the 
possible: wherever a statement is used in the system, its objects file which holds all objects and their attributes, 
line of generated code must be reproduced, rather than using the statement identifier as an index and relational 
looking to a common pool of statement code in which 25 key. If a pathway succeeds, general-purpose subrou- 
each statement's generated code is stored only once. tines are called to extract the appropriate attributes 

Furthermore, since every object is associated with a from the objects file and process the object, 
memory variable, and the contents of these variables Criteria evaluation in this implementation is thus 
can in some cases be lengthy strings of text, total system transformed from an evaluation of generated "if* state- 
memory resources can be exhausted in the course of 30 meets to an examination of data records in a data file, 
processing. Memory problems, for both program size and for the 

Finally, effective program control in such an imple- storage of object values in memory variables, are 
mentation is difficult to achieve, as illustrated in two greatly eased. Statement code is reused, in that it is 
contexts. First, a common situation in expert systems is stored only once, in the resource file, and all statement 
that a user will wish to go back to an earlier object and 35 references by master file records are in the form of 
change an answer. To accommodate this situation in pointers. Further, more program control is gained, be- 
this implementation, the processing of objects must be cause the record pointer in the master file can be di- 
interrupted, program control must return to an outside rectly manipulated, which cannot be accomplished with 
loop, processing must begin again at the top of the main the execution of conventional program code in the first 
body of the program, processing must then bypass all 40 implementation. For example, to go back to an earlier 
objects preceding the object of interest, yet stop at the object and reprocess it, the record pointer is simply 
appropriate object and reprocess it This extra process- repositioned in the file to the start of the appropriate 
ing can take a long time, and considerable code must be object's records, and processing resumes from that 
generated around the basic blocks of nested "if" state- point 

ments in order to handle such situations correctly. 45 Despite these advantages, however, this implementa- 
Another context which presents a difficult problem tion can suffer from predictable performance problems, 
for program control has to do with the nature of the In a large application, large numbers of data records 
problem being solved. Problems involving the monitor- will be generated, and system performance can be nam- 
ing of situations and reactive responses do not lend pered by the frequent interactions with the disk drive 
themselves to such predetermined, procedural code. In 50 that are required in order to access the data, Perfor- 
such cases, the system must be flexible enough to go mance also suffers from the continual need to update 
directly to an appropriate set of objects and process relational pointers among the different data files. These 
them as a subset of the whole knowledge base, and then problems may be addressed to some extent through the 
return to its monitoring state. In the preferred imple- use of faster and more powerful hardware, and the use 
mentation, program control can be directed simply by 55 of disk caching, but they cannot be elirninated alto- 
assigning a new object sequence number to the variable gether. 

that holds the number of the object currently being (3) Hybrid approach: Generated data and generated 
processed, and processing then resumes from that ob- subroutines 

jest* This approach uses a combination of the above two 

(2) Data-driven analysis 60 implementations. Program control is handled by the use 

In another embodiment data rather than code is gen- of data files, as in the second implementation, but for the 
erated. This data is stored as records in a file (the "mas- consideration of object criteria, generated subroutines 
ter" file) in a relational database on a fixed disk, and are called. For each object during system generation, a 
processing occurs by driving through this data until the dedicated subroutine is generated which holds the code 
end of the data file is reached Data is organized in 65 for the statements to be evaluated, and which returns a 
object sequence. Each data record consists of an en- result indicating whether the object should fire, based 
coded criteria statement and pathway information. on the execution of the generated code. Also, some 
Within each object, statements are organized by path- attribute data is assigned by the subroutine to common 
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system memory variables. (For example, the system 
memory variable T would hold the text of the current 
object being processed.) The name of the object's sub- 
routine is some variant of the object's name. During 
processing, object records in a data file (one record per 5 
object) are examined in object sequence, and their ap- 
propriate subroutines are called to determine if the ob- 
ject fires, and to acquire necessary attribute data. 

This implementation succeeds in combining the best 
features of the first two implementations. Program con- 10 
trol is maintained by considering objects in a data file, 
and repositioning the record pointer as needed. Perfor- 
mance is facilitated by using generated "if* statements 
rather than data records for criteria evaluation, and 
assigning attribute data to variables in a procedure, to 15 
avoid retrieval of such data from the objects file. 

Despite these advantages, some performance prob- 
lems remain. Large amounts of program code are still 
generated in the dedicated object subroutines, for again 
no reuse of statement code is possible. The system still 20 
has potentially large data files, with resultant hardware 
performance limitations, and a relational pointer from 
the master file into the objects file is still necessary for 
some attribute data. 

Finally, in all three alternative implementations, the 
advantages of the preferred embodiment's pointer- 
based approach are not enjoyed. All objects are evalu- 
ated at the time they are reached in the processing se- 
quence. In the preferred implementation, only those 30 
objects and statements which can be affected by ac- 
quired values are evaluated, resulting in significant im- 
provements in efficiency. 

I claim: 

1. A computing system including a processor and a 35 
data set, the data set comprising a plurality of objects, 
each object having: 

(a) a data field which may contain a value, the value 
of the object, which value may be set to one of a 
plurality of possibilities by the processor, 40 

(b) a relevance criterion which may be evaluated by 
the processor to yield a logical true or a logical 
false, and 

(c) an action, in the form of processor instructions or 

a pointer to processor instructions, which will be 45 
executed by the processor if evaluation of the rele- 
vance criterion by the processor produces a logical 
true; and 

(d) in one or more of the objects, the action of the 
object instructs the processor to set the value of the 50 
object to one of the plurality of possibilities, and, 

(e) in one or more of the objects, the relevance crite- 
rion includes a reference to the value of another 
object such that the relevance criterion will or will 
not be satisfied depending on the value of the other 55 
object 

2. The computing system of claim 1 in which the 
action of an object comprises making a change to an- 
other object. 

3. The computing system of claim 2 in which the 60 
change to another object is a change in the relevance 
criterion of the other object. 

4. The computing system of claim 3 wherein the data 
set has a specified sequence of the objects beginning 
with a first object and ending with a last object, in 63 
which sequence the relevance criterion of each object 
makes no reference to the value of a later object in the 
sequence. 
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5. The computing system of claim 4 in which the 
change to another object changes the sequence of the 
objects. 

6. A computer method for performing a computation, 
by use of a processor, on a data set comprising a plural- 
ity of objects, each object having: 

(a) a data field which may contain a value, the value 
of the object, which value may be set to one of a 
plurality of possibilities by the processor, 

(b) a relevance criterion which may be evaluated by 
the processor to yield a logical true or a logical 
false, and 

(c) an action, in the form of processor instructions or 
a pointer to processor instructions, which will be 
executed by the processor if evaluation of the rele- 
vance criterion by the processor produces a logical 
true; 

the method having steps comprising: 

(d) detennining the value of a first object, 

(e) evaluating the relevance criterion of a second 
object, which relevance criterion includes a depen- 
dency on the value of the first object, and 

(0 if the relevance criterion of the second object is 
satisfied, executing the action specified by the ob- 
ject and setting the value of the object 

7. The computer method of claim 6 in which the 
action of an object comprises making a change to an- 
other object 

8. The computer method of claim 7 in which the 
change to another object is a change in the relevance 
criterion of the other object 

9. The computer method of claim 8 wherein: 

the data set has a specified sequence of the objects 
beginning with a first object and ending with a last 
object, in which sequence the relevance criterion 
of each object makes no reference to the value of a 
later object in the sequence. 

10. The computer method of claim 9 in which the 
change to another object changes the sequence of the 
objects. 

11. A computing system including means for creating 
or changing a data set which may be processed on a 
computing system with a processor, the data set com- 
prising a plurality of objects, each object having: 

(a) a data field which may contain a value, the value 
of the object, which value may be set to one of a 
plurality of possibilities by the processor, 

(b) a relevance criterion which may be evaluated by 
the processor to yield a logical true or a logical 
false, and 

(c) an action, in the form of processor instructions or 
a pointer to processor instructions, which will be 
executed by the processor if evaluation of the rele- 
vance criterion by the processor produces a logical 
true; and 

(d) in one or more of the objects, the action of the 
object instructs the processor to set the value of the 
object to one of the plurality of possibilities, and, 

(e) in one or more of the objects, the relevance crite- 
rion includes a reference to the value of another 
object such that the relevance criterion will or will 
not be satisfied depending on the value of the other 
object 

12. The computing system of claim 11 in which the 
action of an object comprises changing the relevance 
criterion of another object 

13. The computing system of claims 12 wherein the * 
data set has a specified sequence of the objects begin- 
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ning with a first object and ending with a last object, in 
which sequence the relevance criterion of each object 
makes no reference to the value of a later object in the 
sequence and the action of one or more objects changes 
the sequence of the objects. 5 

14. A computing system for generating validation 
tables from a data set which may be processed on a 
computing system with a processor, the data set com- 
prising a plurality of objects, each object having: 

(a) a data field which may contain a value, the value 10 
of the object, which value may be set to one of a 
plurality of possibilities by the processor, 

(b) a relevance criterion which may be evaluated by 
the processor to yield a logical true or a logical 
false, and 15 

(c) an action, in the form of processor instructions or 
a pointer to processor instructions, which will be 
executed by the processor if evaluation of the rele- 
vance criterion by the processor produces a logical 
true; and 20 

(d) in one or more of the objects, the action of the 
object instructs the processor to set the value of the 
object to one of the plurality of possibilities, and, 

(e) in one or more of the objects, the relevance crite- 
rion includes a reference to the value of another 
object such that the relevance criterion will or will 
not be satisfied depending on the value of the other 
object; 

the computing system comprising: 3Q 

(f) instructions for reading the contents of an object, 
and 

(g) instructions for generating a validation table from 
the contents of the object 

15. The computing system of claim 14 in which the 35 
action of an object comprises changing the relevance 
criterion of another object 

16. The computing system of claim 15 wherein the 
data set has a specified sequence of the objects begin- 
ning with a first object and ending with a last object, in 4q 
which sequence the relevance criterion of each object 
makes no reference to the value of a later object in the 
sequence and the action of one or more objects changes 
the sequence. 

17. A computing system for interpreting into natural 45 
language the contents of a data set which may be pro- 
cessed on a computing system with a processor, the data 
set comprising a plurality of objects, each object hav- 
ing: 

(a) a data field which may contain a value, the value 50 
of the object, which value may be set to one of a 
plurality of possibilities by the processor, 

(b) a relevance criterion which may be evaluated by 
the processor to yield a logical true or a logical 
false, and 55 

(c) an action, in the form of processor instructions or 
a pointer to processor instructions, which will be 
executed by the processor if evaluation of the rele- 
vance criterion by the processor produces a logical 
true; and 60 

(d) in one or more of the objects, the action of the 
object instructs the processor to set the value of the 
object to one of the plurality of possibilities, and, 
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(e) in one or more of the objects, the relevance crite- 
rion includes a reference to the value of another 
object such that the relevance criterion will or will 
not be satisfied depending on the value of the other 
object; 

the computing system comprising: 

(f) instructions for reading die relevance criterion and 
action of one of the objects, and 

(g) instructions for interpreting the relevance crite- 
rion and action into natural human language, and 

(h) instructions for adding one or more natural human 
language words to make one or more complete 
natural human language sentences. 

18. The computing system of claim 17 in which the 
action of an object comprises changing the relevance 
criterion of another object. 

19. The computing system of claim 18 wherein the 
data set has a specified sequence of the objects begin- 
ning with a first object and ending with a last object, in 
which sequence the relevance criterion of each object 
makes no reference to the value of a later object in the 
sequence and the action of one or more objects changes 
the sequence. 

20. A computing system for examining a data set 
which may be processed on a computing system with a 
processor, the data set comprising a plurality of objects, 
each object having: 

(a) a data field which may contain a value, the value 
of the object, which value may be set to one of a 
plurality of possibilities by the processor, 

(b) a relevance criterion which may be evaluated by 
the processor to yield a logical true or a logical 
false, and 

(c) an action, in the form of processor instructions or 
a pointer to processor instructions, which will be 
executed by the processor if evaluation of the rele- 
vance criterion by the processor produces a logical 
true; and 

(d) in one or more of the objects, the action of the 
object instructs the processor to set the value of the 
object to one of the plurality of possibilities, and, 

(e) in one or more of the objects, the relevance crite- 
rion includes a reference to the value of another 
object such that the relevance criterion will or will 
not be satisfied depending on the value of the other 
object; 

the computing system comprising: 
(£) instructions for reading information from the data 
set, and 

(g) instructions for determining what steps a com- 
puter would take upon execution of the actions 
specified by one or more of the objects. 

21. The computing system of claim 20 in which the 
action of an object comprises changing the relevance 
criterion of another object. 

22. The computing system of claim 21 wherein the 
data set has a specified sequence of the objects begin- 
ning with a first object and ending with a last object, in 
which sequence the relevance criterion of each object 
makes no reference to the value of a later object in the 
sequence and the action of one or more objects changes 
the sequence. 

***** 
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ABSTRACT 



A pattern recognition device having modifiable feature 
detectors (28) which respond to a transduced input signal 
(26) and communicate a feature activity signal (30) to allow 
classification and an appropriate output action (70). A 
memory (40) stores a set of comparison patterns, and is used 
by an assigner (66) to find likely features, or parts, in the 
current input signal (26). Each part is assigned to a feature 
detector (28[m]) judged to be responsible for it. An updater 
(42) modifies each responsible feature detector (28[m]) so as 
to make its preferred feature more similar to its assigned 
part. The modification embodies a strong constraint on the 
feature learning process, in particular an assumption that the 
ideal features for describing the pattern domain occur inde- 
pendently. This constraint allows improved learning speed 
and potentially improved scaling properties. 

A first preferred embodiment uses a group of noisy-OR type 
neural networks (50) to implement the feature detectors (28) 
and memory (40), and to obtain the parts by a soft segmen- 
tation of the current input signal (26). A second preferred 
embodiment maintains a lossless memory (40) separate from 
the feature detectors (28), and the parts consist of differences 
between the current input signal (26) and comparison pat- 
terns stored in the memory (40). 



20 Claims, 12 Drawing Sheets 
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Fig. 3 
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For u = 0 to NUMUNITS[0]-1 { 
Set ACT[0][u] = INPUT[u] 

} 

For LAY = 1 to NUMLAYERS-1 { 
For u = 0 to NUMUNITS[LAY]-1 { 
Set ACT[LAY][u] = 0 

} 

} 



For LAY = 0 to NUMLAYERS-1 { 
For u = 0 to NUMUNITS[LAY]-1 { 
Set ORDER[LAY][u] = a random 
integer between 0 and 
NUMUNITS[LAY]-1, inclusive, 
without replacement 

} 

} 



For i = 0 to 

NUMUNITS[NUMLAYERS-1]-1 { 
Set PROBOFF[NUMLAYERS-1][i] = 
1 • WEIGHT[NUMLAYERS-1][i][0] 

} 

For LAY = 0 to NUMLAYERS-2 { 
For i = 0 to NUMUNITS[LAY]-1 { 
Set PROBOFF[LAY][i] = 1 

} 

} 




For i = 0 to NUMUNITS[0]-1 { 
Set NETOFFBELOW[0][i] = 0 

} 

For LAY = 1 to NUMLAYERS-1 { 
For j = 0 to NUMUNITS[LAY]-1 { 
SUM = 0 

For i = 0 to NUMUNITS[LAY-1]-1 { 
If ACT[LAY-1][i] ==0{ 
Set SUM = 
SUM • log(1 - 

WEIGHT[LAY-1][i][j]) 

} 

} 

Set NETOFFBELOW[LAY][j] =SUM 



} 



} 



For LAY = 0 to NUMLAYERS-1 { 
For i = 0 to NUMUNITS[LAY]-1 { 
Set UNITPROB[LAY][l] = 1 
Set COUNT[LAY][i] = 0 
Set COUNTBIAS[LAY][i] = 0 

} 

} 



( Stop ) 



Fig. 5 
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For LAY = 0 to NUMLAYERS-1 { 
For u = 0 to NUMUNITS[LAY]-1 { 

Set i = ORDER[LAY][u] 

If value of ACT[LAY][i] Is not clamped { 
Select a new value for ACT[LAY][i] 
using Qibbs Sampling (see Fig. 7) 



If value o1 ACT[LAY][i] changed { 
If LAY > 0 { 

For k = 0 to NUMUNITS[LAY-1]-1 { 
If ACT[LAY][i] == 1 { 

Set PR0B0FF[LAY-1][k] = 

PR0B0FF[LAY-1][k] * (1 - WEIGHT[LAY-1][k][i]) 

} 

Else{ 

Set PR0B0FF[LAY-1][k] = 

PR0B0FF[LAY-1][k] / (1 • WEIGHT[LAY-1][k][i]) 

} 

If PR0B0FF[LAY-1][k] > 1 { 
Set PR0B0FF[LAY-1][k] = 1 

} 

} 

} 

If LAY < NUMLAYERS-1 { 

For j = 0 to NUMUNITS[LAY+1]-1 { 
If ACT[LAY][I] == 1 { 

Set NET0FFBEL0W[LAY+1][j] = 

NET0FFBEL0W[LAY+1][|] + log(1 - WEIGHT[LAY][i][j]) 

} 

Else { 

Set NET0FFBEL0W[LAY+1][j] = 

NET0FFBEL0W[LAY+1][j] - log(1 • WEIGHT[LAY][i][j]) 

} 

} 

} 

} 

} 

} 



Fig. 6 
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Set NET = 




log(1 ■ MIN(0.99, PROBOFF[LAY][i])) 


- log(MIN(0.99, PROBOFF[LAY][i])) 


i 


r 


Set NET = 




NET ■ NETOFFBELOW[LAY][i] 




' 


If ACT[LAY][i] == 1 { 


Set NET = -NET 




} 






r 



For k = 0 to NUMUNITS[LAY-1]-1 { 
If ACT[LAY-1][k] == 1 { 
If ACT[LAY][i] == 0 { 
Set PW = 1 - 

(PROBOFF[LAY-1][k] * 
(1 - WEIGHT[LAY-1p][i])) 

} 

Else{ 

Set PW = 1 • 

(PROBOFF[LAY-1l[k]/ 
(1 - WEIGHT[LAY-1][k][i])) 

} 

Set P = 1 - PROBOFF[LAY-1][k] 
If PW <= 0 { 
If P > 0 { 

Set NET = NET - 1000 

} 

} 

Else{ 
If P > 0 { 

Set NET = NET + log(PW/P) 

} 

Else { 

Set NET s NET + 1000 

} 



-Q Start ^ 



Set RAND = a uniform random real 
value between 0 and 1 

If RAND < (1 / (1 + exp(-NET))) { 
If ACT[LAY][I] == 1 { 
Set ACT[LAY][i] = 0 

} 

Else{ 

Set ACT[LAY][i] = 1 

} 

} 



( $t °P ) 



Fig. 7 
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For LAY = 0 to NUMLAYERS-1 { 
For i = 0 to NUMUNITS[LAY]-1 { 

If ACT[LAY][i] == 1 { 

Set UNITPROB[LAY][i] = 

UNITPROB[LAY][i] * (1 - PR0B0FF[LAY][i]) 

} 

Else{ 

Set UNITPROB[LAY][i] = 

UNITPROB[LAY][i] * PROBOFF[LAY][i] 

} 

} 

} 



Fig. 8 



11/02/2003, EAST Version: 1.4.1 



U.S. Patent May 2, 2000 Sheet 9 of 12 6,058,206 



Set LRATE = 1 

For LAY = 1 to NUMLAYERS-1 { 
For i s 0 to NUMUNITS[LAY]-1 { 

Set COUNTBIAS[LAY][i] = COUNTBIAS[LAY][i] + 0.05 

If ACT[LAY][I] == 1 { 

Set COUNT[LAY][i] = COUNT[LAYJ[i] + 0.05 
For k = 0 to NUMUNITS[LAY-1]-1 { 
Set D = -1 

If ACT[LAY-1][k] == 1 { 

Set D = D + (1 / (1 - PROBOFF[LAY-1][k])) 

} 

Set WPRE = WEIGHT[LAY-1][k][i] 

Set LPRE = -log(1 - WEIGHT[LAY-1][k][l]) 

Set WEIGHT[LAY-1][k][i] = WEIGHT[LAY-1][k][i] + 

(D * LRATE * WEIGHTf.LAY-1][k][i] / COUNT[LAY][i]) 
If WEIGHT[LAY-1][k]li] > 0.99 { Set WEIGHT[LAY-1][k][i] = 0.99 } 
If WEIGHT[LAY-1][k][i] < 0.01 { Set WEIGHT(LAY-1][k][i] = 0.01 } 
Set PROBOFF{LAY-1][k] = PROBOFF[LAY-1][k] / (1 - WPRE) 
Set PROBOFF[LAY-1][k] = PROBOFF[LAY-1][k] * 

(1 .WEIGHT[LAY-1][k][i]) 
If (PROBOFF[LAY-1][k] > 1 { Set PROBOFF[LAY-1][k] = 1 } 
If ACT[LAY-1][k] =a 0 { 

Set NETOFFBELOW[LAY][i] = NETOFFBELOW[LAY][i] + 
(-log(1 - WEIGHT[LAY-1][k][i]) - LPRE) 

} 

} 

} 

If last cycle and LAY == NUMLAYERS-1 { 
Set D = -1 

If ACT[LAY][1] == 1 { 

Set D = D + (1 / (1 - PROBOFF[LAY][i])) 

} 

Set WEIGHT[LAY][i][0] = WEIGHT[LAY][i][0] + 

(D * LRATE * WEIGHT[LAY][i][0] / COUNTBIAS[LAY][i]) 
If WEIGHT[LAY][i][0] < 0.01 { Set WEIGHT[LAY][l][0] = 0.01 } 
If WEIGHTILAY][i][0] > 0.25 { Set WEIGHT[LAY][i][0] = 0.25 } 
Set PROBOFF[LAY][i] = 1 • WEIGHT[LAY][l](0] 

} 

} 

} 



Fig. 9 
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( start ) 



Select a training pattern, 
randomly without replacement 
(recycle set as necessary) 



Select a comparison pattern; 
store it in COMPAREPAT[ ] 




i 



For n = 1 to N { 
Set DIFFfnJ = 
TRAINPAT[n] & 
(! COMPAREPAT[n]) 

} 



Set MIN = maximum possible 

distance 
Set IMIN = -1 
For m = 1 to M { 
Set DIST = 0 
For n = 1 to N { 

Set D = WEIGHT[m][n] - 

DIFF[n] 
Set DIST = DIST + (D * D) 

} 

Set DIST = sqrt(DIST) 
If DIST < MIN { 
Set MIN = DIST 
Set IMIN = m 

} 
} 



Set LRATE = 1.0 * 

(1 / NUMPATS) * (1 / ITRIAL) 
For n = 1 to N { 
Set D = DIFFfn] ■ 

WEIGHT[IMIN][n] 
Set WEIGHT[IMIN][n] = 
WEIGHT[IMIN][n] + 
(LRATE * D) 

} 



*Q Stop ) 



Fig. 12 
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PATTERN RECOGNIZER WITH certain feature sets can be (or are likely to be) learned. Such 

INDEPENDENT FEATURE LEARNING constraints effectively reduce the amount of "feature space" 

which must be searched by the learning process, making the 

BACKGROUND — FIELD OF INVENTION process both faster and less susceptible to getting stuck with 

This invention relates to pattern recognition methods and 5 bad features * 

machines, specifically to an improved method and machine One example of constraint-based feature learning is that 

for training feature-based pattern recognizers. of Simard et al. (e.g. Hastie, Simard, & Sackinger, 1995, 

"Learning prototype models for tangent distance", in 

BACKGROUND— DISCUSSION OF PRIOR ART Advances in Neural Information Processing Systems 7, MIT 

Pattern recognizers can be used in many ways, all of 10 Press > Cambridge, Mass.)- Their neural network-type 

which involve automatically responding to some physical method is applied to character (e.g. handwriting) recogrn- 

pattern in the world. For example, the physical pattern might uon - In effect > ^ir approach is to force the network to 

be speech sound waves, in which case a pattern recognition automatically generalize anything it learns about a particular 

device might be used to output the same utterance but in a example character to all possible "transformed" versions of 

different language. Or the physical pattern might be the 15 the character. Here, the transformations include stretching, 

locations of vehicles on a particular highway, and the pattern shrinking, slanting, and the like. While this does signifi- 

recognizer might then be used to control the traffic lights on cantl y im P rove generalization (and learning speed, since a 

that highway so as to minimize congestion. smaller set of examples is required), the solution is rather 

Often it is desirable to apply a pattern recognizer to a task M ^cinc to things such as writing. It wouldn't directly apply 

which is poorly understood, or even a task which changes 20 tospeecb waveforms, for example. A further disadvantage of 

«™ t« u ^ „„ . this solution is that it only applies at the input level, where 

over tune. In such circumstances, an adaptive pattern , j rr r » 

recognizer, which learns the task based on a sequence of J» a ( rc fi f m P ut * network. It doesn t 

examples, can work much better than a "hard-wired" (non- a f me m ^ l ^ 6IS ° f a ^flayer network, because 

adaptive) one. Also, like adaptivity, "feature based" recog- „ ^internal features are learned, and thus it is not clear 

nition can be very advantageous, in general because it tends 25 * ow t0 *PP* ^ consteamts (e.g slant-independence) to 

. . -i frt1o , fl „ f ' t u„ 0 mfl „L flC / . 00 them. Similarly, this method doesn t address the problem 

to be more noise -tolerant than other approaches (such as . , f_ ' t , . . r , r 

fixed template matching). Feature based recognition f ch neural networks have » in sealing up to large numbers of 

involves responding to a set of features, or characteristics, f eatu f s : This scaling problem (which creates prohibitively 

which are determined to exist within the pattern. For „ long trainmg Umes) results from me exponenhally mere as- 

example,ifthepatternswerespeechwaveforms,lhefeatures 30 ln f number of possible feature combinations, and must be 

. j f . . , . . i j I, iii .» it i_ * t_ i-*. j solved at all levels of feature detection in order to be 

detected might include "a 'k' sound", or "a high-amplitude 7 ? T - , . 

frequency within the twenty-eighth time interval". In a sigmncanUy reduced. 

recognizer which is adaptive as well as feature-based, the OBJECTS AND ADVANTAGES 
features may even be very complex and difficult to describe 35 

in human language. Accordingly, my invention has several objects and advan- 

The device of the present invention is both adaptive and tages over the prior art pattern recognition methods. Like 

feature-based. One of the most difficult problems in design- backpropagation and some other neural network training 

ing such pattern recognizers is determining the best set of methods, my invention may be used for adaptive learning 

features — or more precisely, determining how the recog- 40 within a feature based pattern recognition machine. It 

nizer should be trained so that it wilt learn the best set of improves upon these previous methods however, in that it 

features. Very often, once a good set of features is found, the provides a strong constraint on the learning, thus reducing 

recognition problem becomes trivial. the learning time and reducing the likelihood of poor fea- 

One approach to learning good features is the use of a tures bein S learDed - ™ s constraint is based on an assump- 

neural network trained with the backpropagation method 45 tl0n that the ldcal < or " true ") fcatures occur independently 

(Rumelhart, Hinton, & Williams, 1986, "Learning internal ( are not "correlated") m the set of physical patterns, 

representations by error backpropagation", in Parallel Dis- Ironically, this assumption has often been invoked m the 

tributed Processing: Explorations in the Microstructure of P rior art > but P rior to m y invention has not been used to its 

Cognition, MIT Press, Cambridge, Mass.). However, this fullest extent. 

approach (and many related gradient-based neural net 50 My method makes much more extensive use of the 

methods) can be very slow to learn, especially with networks independent features assumption, making it very powerful, 

having many layers of neurons. It is also quite possible that Because this assumption is not limited to a particular class 

it will not leam optimal, or even nearly optimal, features. of pattern recognition tasks (e.g. only optical character 

This is because it is based on a hill-climbing type of learning recognition), my invention's advantages are likely to be 

which can get stuck in a "valley" very far from the globally 55 obtained on a wide variety of tasks. Furthermore, the 

optimal solution. The result might be features which work assumption is actually more powerful when more feature 

well on the training examples, but not on new examples (i.e., detectors are used. This allows for potentially improved 

poor generalization of learning). scaling of the method to larger recognizers, which has long 

There have been many attempts to improve on the learn- been a goal of the neural network research community. Still 
ing speed or the generalization ability of these neural 60 further, the independent features assumption can be applied 

network pattern recognizers, but typically such improve- at every layer of a hierarchical, multilayer recognition 

ments either do not solve both of these problems at once, or device. This gives my device even more ability to speed 

do not result in significant improvement on a usefully wide learning and improve generalization when compared to 

range of tasks. Arguably, the solutions which work best, constraint-based procedures which apply only at the input 
though, tend to be ones which impose constraints on the 65 layer. 

learning process; that is, based on some assumptions about A further object of my invention is that multiple similar 

the task at hand, they constrain the learning so that only recognition systems can be created by training one such 
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system and transferring the resulting trained weights to 
another system. 

Further objects and advantages of my invention will 
become apparent from a consideration of the drawings and 
ensuing description. 

DESCRIPTION OF DRAWINGS 

FIG. 1 is a block diagram of a pattern recognition system 
according to the present invention, showing subsystems 
common to both preferred embodiments. 

FIG. 2 is a flow diagram of an overall procedure for 
operating the preferred embodiments. 

FIG. 3 is block diagram showing the structure of the first 
preferred embodiment. 

FIG. 4 is a flow diagram of an overall procedure for 
operating the first preferred embodiment. 

FIG. 5 is a flow diagram depicting the initialization of 
parameters for the first preferred embodiment. 

FIG. 6 is a flow diagram of a single cycle of Gibbs 
sampling as performed in the first preferred embodiment. 

FIG. 7 is a flow diagram of a procedure for selecting a 
new unit activation within a cycle of the Gibbs sampling 
process of the first preferred embodiment. 

FIG. 8 is a flow diagram of a procedure for updating a 
unit's contribution to the likelihood within a cycle of the 
Gibbs sampling process of the first preferred embodiment. 

FIG. 9 is a flow diagram of a procedure for updating 
connection weights within a cycle of the Gibbs sampling 
process of the first preferred embodiment. 

FIG. 10 is a block diagram showing the structure of the 
second preferred embodiment. 

FIG. 11 is a flow diagram of an overall procedure for 
operating the second preferred embodiment. 

FIG. 12 is a flow diagram of a procedure for training the 
feature detectors of the second preferred embodiment. 

LIST OF REFERENCE NUMERALS 

20 Environment and/or Further device 
22 Physical pattern 
24 Transducer 
26 Input signal 
28 Feature detectors 
30 Feature activity signal 
32 Feature description signal 
34 Classifier 
36 Output signal 
38 Effector 
40 Memory 
42 Updater 

44 Part mapping signal 
46 Target signal 
50 Class networks 
64 Updating signal 
66 Assigner 
68 Retrieval signal 
70 Action 

SUMMARY 

In accordance with the present invention a pattern recog- 
nition device comprises a sensory transducer, a group of 
feature detectors, a classifier, and an effector to automati- 
cally respond to physical patterns. The device further com- 
prises the improvement wherein an assigner uses previous 
input patterns, as stored in a memory, to segment a current 
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input pattern into parts corresponding to the feature 
detectors, and at least one feature detector is modified so as 
to increase its preference for its assigned part. 

Theory of the Invention 

I believe there are interesting theoretical reasons for the 
advantages of my invention. This section describes this 
theory as I currently understand it. 

Machines and/or methods of pattern recognition which 
are feature-based can be very powerful. For example, con- 
sider a recognizer for printed characters which has detectors 
for such features as "horizontal line at the top", "vertical line 
on the right", etc. One reason this is a powerful approach is 
that a relatively small number of such feature detectors can 
cooperate to recognize a large number of possible charac- 
ters. Indeed, the number of different recognizable characters 
increases exponentially with the number of features 
(although this exponential increase is both a blessing and a 
curse, as will be explained shortly). For example, using only 
20 binary (on/off) features, over a million possible patterns 
can be recognized; with 1000 features, the number of 
possibilities is almost too hard to comprehend — and this is 
still a puny number compared to the number of neurons in 
a human brain! 

Another advantage of feature based recognition is noise 
tolerance. Essentially, if "enough" of the features in a pattern 
are detected, recognition can be good even though feature 
detection is not. For example, a capital "A" could be 
recognized even if the "horizontal bar in the middle" was 
missed (perhaps due to a bad printer), simply because "A" 
is the only letter with the (detected) features, "right-leaning 
diagonal on the left", "left-leaning diagonal on the right", 
and "intersecting line segments at the top". There are many 
possible feature sets which might be used for character 
recognition, but these serve to illustrate the basic point of 
fault-tolerance. 

As powerful as feature based recognition can be, it is still 
much more powerful when the features can be learned from 
examples, rather than being hardwired by a human designer. 
Such adaptivity underlies the recent research interest in 
neural networks, for example, which in their most typical 
form are just successive layers of (adaptive) feature detec- 
tors. Indeed, many would argue that human intelligence is so 
impressive partly because it is based on naturally occurring 
adaptive neural networks with billions of neurons, wherein 
each neuron can be viewed as a feature detector. 

However, the power of adaptive feature based recognition 
has always come at a price. In particular, learning the 
features can be very slow, and can result in suboptimal 
features being learned. Moreover, this problem seems to get 
worse, the greater the number of feature detectors being 
trained. This is the "curse" aspect of the exponentially 
increasing number of feature combinations, as alluded to 
above. 

I believe, though, that the curse is not as bad as the prior 
art literature would suggest. Indeed, I believe it can't be, or 
else human brains, with their billions of feature-detecting 
neurons, could not learn nearly as fast as they do. I further 
believe that my invention makes use of a principle which is 
also used by human brains. This principle is what I call 
"independent feature learning". 

Most prior art recognizers (adaptive feature-based ones) 
perform training of the feature detectors in an essentially 
similar way: They first attempt to identify which (important) 
features are contained in the current input pattern. Then they 
modify all the feature detectors so as to make the entire 
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recognizer better at detecting that particular combination of Biological Cybernetics, 64:165-170), which incorporates 
features. Thus if a "T" was observed, the combination of "competitive" connections between the feature detectors to 
features "top horizontal bar" and "middle vertical bar" might encourage them to learn different features. However, these 
be reinforced. Importantly, this means that now whenever prior art methods have not made use of the independence 
"top horizontal bar" is observed, "middle vertical bar" will s assumption nearly to the degree possible. For example, 
be considered more likely as well, and vice versa. The systems like Foldiak's which incorporate competitive con- 
recognizer has been taught that these two features are (to nections can only discourage second-order dependencies 
some extent) correlated in the set of possible input patterns. (correlations), not higher-order ones, as does my device. 

The essence of my invention, on the other hand, is the A^ 0 * these systems often perform "batch" training, where 
assumption that features are not correlated; rather, they are 10 weight changes are saved, and actually performed only after 
assumed to be statistically independent of one another a set of pattern presentations. Unlike my device, such 
throughout the set of input patterns. An embodiment of my procedures do not allow the learning done on a current 
invention, upon observing the "T", might train one feature pattern to be immediately used for assisting future learning, 
detector to better respond to "top horizontal bar", and Furthermore, I believe it is typical of these prior art 
another to increase its preference for "middle vertical bar", 15 methods that an essentially distinct subsystem (e.g. separate 
but without increasing the preference of any detector for the connections, or an additional penalty term in the training 
combination of these two features. cost function) is used to counteract the effects of an other- 
Why should this be a good training method? Because in ^ conventional training procedure. In such a method, I 
the absence of evidence to the contrary, it is a good first believe, the countermeasure will always lag behind the 
guess that any given feature could occur within any other 20 principal, error-reducing weight updates. My invention, in 
combination of features. A recognizer which has just "dis- contrast, embeds the independence assumption in the prin- 
covered" the "top horizontal bar" feature, for example, could ci V^ ( and onlv ) weight update procedure; thus the depen- 
also find this feature useful when it later encounters "E", dencies need not be learned by a separate subsystem in order 
"F", 'T, "Z", "5", "7", and perhaps other symbols too. But to bc ( later ) removed. 

if it had been trained that "top horizontal bar" implies 25 Commonalities of the Preferred Embodiments 

"middle vertical bar*' as well — as prior art recognizers m . . * , . . , - , 

typically learn when observing a "T"_ it would later have to ^ sectl ? D ^scribes aspects of the invention which are 

unlearn this information when it encountered the other commo ° * ^ h °* m > 'P re {? rred embodiments, with refer- 

symbols. In essence, my device is advantageous because it ence 10 F[ £ S ' 1 and 2 * DetaUs of Purred embodiments 

does not require such unlearning. Indeed, I believe that in a 30 " e S lven below - 

typical prior art training regime, the appropriate unlearning Xf5^ iew .„ . 

of spurious correlations often never occurs, because the A FIG ' 1 illustrates an overview of a pattern recognition 
amount of training patterns is just too small. TTius I believe device ^mg * the invention. Direction of signal propa- 
my device can not only learn with many fewer pattern S atl0D ^ indicated by arrows Propagation of multiple- 
observations, but it can also learn better features based on 35 element signals should be lomcally paraUel for the most part, 
those observations meaning most (and preferably all) of a signal s elements 
T , * , A (each of which represents a scalar value) should be received 

Moreover, I believe this advantage becomes more and b a corresponding proce ssing stage before that stage per- 

more important as the number of feature detectors increases. forms itfi Uons md OTt te its results< For clarit ^ each 

This is because the number of feature combinations 40 si j oai £^ ia3dDn nm is given a TtteTtnct i abe l iden- 

increases exponentially, so in some sense the amount of ^ to ^ of ^ gi { which uses ^ ^ 

inappropriate correlation learning .done by prior art recog- Operation of the system (described further below) is 

nizcrs also increases exponentially-and thus so does the kted b a whicfa be s{ { a human ^ QT 

amount of unlearning that must be done. Hus means my be 0f a device ^ re ^ alion inchldes 

device has the potential for improved scaling to large 45 the orientation of the system such that it interacts with an 

numbers of feature detectors. environment and/or further device 20 (referred to hereafter 

One might argue that the assumption of independently as simply the environment 20). This orientation is such that 

occurring features— which leads to my approach of training a physical pattern 22 is communicated from the environment 

the feature detectors independently— is not necessarily 20 to the system, in particular to a transducer 24. The 

appropriate in all situations. For example, what if (in some 50 transducer 24 converts the physical pattern 22 into a repre- 

strange alphabet) the letter "T" was the only letter with a sentation of that physical pattern 22, which conversion I take 

"top horizontal bar" or a "middle vertical bar"? Surely then to include any "preprocessing" operations. This representa- 

it would be appropriate to train the recognizer that these tio D i ^ 6S foe form of an input signal 26. 1 will often refer 

features always occur together, wouldn't it? One answer to herein to the information represented by a given input signal 

this is that in such a situation, the entire "T" would be a more 55 26 as an "input pattern". 

appropriate feature to learn. More generally, features which Agroup of feature detectors 28 is connected to receive the 

are highly correlated with other features tend not to be very mpllt s ign a i 26 from the transducer 24. Each of the feature 

useful anyway; they tend to waste feature storage capacity in detectors 28 is configured to detect or "prefer" a certain 

the recognizer. In any case, though, my invention does not feature vector when it occurs in the input signal 26. Each 

prevent the learning of correlations among features. It sim- 60 feature detector 28[/n] outputs a corresponding feature activ- 

ply makes independence of features the default assumption; ity signal element 30[w] representative of the (scalar) degree 

this assumption can still be "overruled" by further learning. to which it has detected its feature within the current input 

Some prior art methods have invoked the principle of signal 26 (i.e., the degree to which the detector "fires", or 

independence, in an attempt to encourage learning of finds a good "match" in the input signal 26). In some 

"factorial", or "information preserving" internal representa- 65 embodiments this feature activity signal element 30[m] may 

tions. One example is that of Foldiak (1990, "Forming reflect the results of competition or other communication 

sparse representations by local anti-Hebbian learning", in between the feature detectors 28. Each feature detector 
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28[m] is also configured to output a multiple -element feature 
description signal element 32[w], which is representative of 
the feature the detector 28[m] prefers. 

A classifier 34 is connected to receive the feature activity 
signal 30 from the feature detectors 28. The classifier 34 is 
also connected to receive a target signal 46 from the envi- 
ronment 20. The classifier 34 is configured, via training 
(using the target signal 46) and/or hardwiring, to produce an 
output signal 36 representative of an appropriate system 
response given the feature activity signal 30. For example, 
the output signal 36 may represent degrees of membership 
in various classes, such as the probability a handwritten 
character input is 'A*, 'B', 'C, etc. It is important to note 
that the classifier 34, while named such to reflect its typical 
use, need not actually perform a classification per se. The 
important thing is that the output signal 36 it produces 
represents an appropriate system response, whether or not it 
also represents a class label. 

An effector 38 is connected to receive the output signal 
36, and is configured to take some action 70 in the world 
based on that signal 36. For example, if the system is being 
used to recognize handwritten characters, the effector 38 
might store an ASCII representation of the most probable 
character into computer memory, perhaps to allow a user to 
send email using a device too small for a keyboard. 

The feature detectors 28 are trained using a memory 40, 
an assigner 66, and an updater 42. The memory 40 is 
connected to receive the input signal 26. The memory 40 is 
capable of storing, possibly in an approximate ("lossy") 
way, a representation of a set of previous input patterns, 
which are called "comparison patterns" with respect to the 
current and future input signals 26. 

The assigner 66 is connected to access the stored contents 
(comparison patterns) of the memory 40 via a retrieval 
signal 68. It is capable of using this storage to segment the 
current input pattern (as represented by the current input 
signal 26) into parts. Each part represents a vector which is 
judged by the assigner 66 to be a likely feature which is not 
only contained in the input signal 26, but is also likely to be 
useful for describing the collection of stored comparison 
patterns in the memory 40. Put differently, a part is a vector 
which is judged likely to be a true feature of the entire input 
domain, which includes past patterns, the current pattern, 
and (hopefully) future patterns as well. 

The assigner 66 is connected to receive the feature 
description signal 32 and makes use of this signal 32 to 
create a part mapping signal 44, representative of a corre- 
spondence between feature detectors 28 and the parts. As 
described later, the memory 40 may also make use of the 
feature detectors 28 in storing input patterns. Furthermore, 
the assigner 66 may make use of the target signal 46 in 
creating the part mapping signal 44. The parts themselves 
may be explicitly represented (internally) by the assigner 66, 
or may only be represented implicitly in the part mapping 
signal 44. 

The updater 42 is connected to receive the part mapping 
signal 44 from the assigner 66. It is configured to modify the 
feature detectors 28 based on this signal 44. In particular, the 
updater 42 can modify a feature detector 28[m] so as to 
increase the preference that the feature detector 28[m] has 
for the part corresponding to it. Put differently, the preferred 
feature of the feature detector 28[w] is moved toward, or 
made more similar to, the part assigned to it. The influence 
of the updater 42 is indicated in FIGS. 1, 3, and 10 by an 
updating signal 64. In some non-preferred embodiments, 
however, the feature updates might be made directly (e.g. via 
hardware), with no intervening updating signal 64 being 
required. 
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FIG. 2 illustrates an overview of the operation of a pattern 
recognition device according to the invention. Use of the 
device comprises a sequence of "trials", or physical pattern 
presentations. On each trial, either recognition is performed, 

5 or training is performed, or both are performed. In my 
preferred embodiments, recognition (if enabled) is per- 
formed before training (if enabled). However, many useful 
embodiments might exist wherein training is done before or 
simultaneous with recognition, 

10 Both recognition and training require the physical pattern 
22 to be observed, and a representative input signal 26 to be 
produced by the transducer 24. Other steps depend on 
whether recognition and/or training are enabled. 
The schedule of enablement of training and recognition 

35 over trials is discussed below for each embodiment sepa- 
rately. One point should be made here, though. In my second 
preferred embodiment, the memory 40 is separate from the 
feature detectors 28, and input patterns are stored in the 
memory 40 before any training or recognition occurs. 

20 However, in my first preferred embodiment, the feature 
detectors 28 are actually used to implement the memory 40. 
In this case, storage of patterns in memory 40 is accom- 
plished by the same procedure as training of the feature 
detectors 28. Thus, with respect to the first preferred 

25 embodiment, the step shown in FIG. 2 as "Store a set of 
patterns in memory" includes the setting of initial random 
preferred features, and possibly doing feature training on 
some number of patterns. 

30 If recognition is enabled, the input signal 26 is commu- 
nicated to the feature detectors 28, which evaluate the input 
against their preferred features and produce appropriate 
feature activity signal elements 30[1] through 30[M]. (In 
some embodiments an equivalent step is done as part of the 

35 training process, too.) The feature activity signal 30 
(composed of the elements 30[1] through 30[M]) is used by 
the classifier 34 to produce an output signal 36. The output 
signal 36 is used by the effector 38 to take an appropriate 
action 70 within the system's environment 20. 

40 If training is enabled, the input signal 26 is communicated 
to the memory 40, which may store the current input pattern 
information, and to the assigner 66. The assigner 66 uses the 
stored comparison pattern information, obtained from the 
memory 40 via the retrieval signal 68, to segment or parse 

45 the input signal 26 into parts. (In some embodiments the 
memory 40 may be implemented using the feature detectors 
28 or their equivalent). The assigner 66 then uses the feature 
description signal 32 to assign the parts to corresponding 
feature detectors 28. The results of this assignment are 

50 communicated via a part mapping signal 44 to the updater 
42. The assigner 66 may in some embodiments make use of 
the target signal 46 to perform the assignment. The updater 
42 modifies the preferred features of the feature detectors 28. 
The modification is such that a feature detector 28[m] 

55 increases its preference for the part assigned to it. 

After a significant number of training trials have occurred, 
the feature detectors 28 store valuable information about the 
input pattern domain, which may be used to bypass the 
training phase in a comparable pattern recognition device. 

60 Thus as shown in FIG. 2, the preferred features of one or 
more of the feature detectors 28 may be transferred (which 
includes being copied) to one or more comparable devices 
after some amount of training. A comparable device would 
be one having a transducer similar to the transducer 24, and 

65 one or more feature detectors similar to the feature detectors 
28, so as to be capable of making appropriate use of the 
trained preferred features. 
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Implementation Details 

Each of my preferred embodiments is implemented using 
a suitably programmed general-purpose digital computer. 
Generally speaking, signals and other representations will 
thus be implemented with storage space in the random 
access memory of the computer. Such an implementation is 
preferred in part because of the high availability and rela- 
tively low cost of such machines (as opposed to, for 
example, analog and/or non-electronic devices). Certain 
experimental manipulations may also be desirable, and these 
will typically be easiest to perform on a general-purpose 
machine via software. Furthermore, those skilled in the art 
of adaptive pattern recognition tend to be most familiar with 
software based implementations of pattern recognizers. Still 
further, such a system, once trained, can easily be used to 
create other systems to perform similar tasks, by copying 
trained weights and/or program code into other recognition 
systems. 

In order to describe the computer program part of the 
preferred embodiments, variable names will be used to 
denote corresponding digital storage locations. These 
variables, along with the system parts which they 
implement, will be given with the preferred embodiment 
specifics below. 
Pseudo-code Conventions 

Some of the drawings make use of a "pseudo-code" which 
largely resembles the C programming language. One reason 
for this is to reduce the number of figures which must be 
used to represent a procedure. I believe this will make the 
overall methods depicted easier to understand by a typical 
pattern recognition programmer than if the methods were 
broken up into still more figures. In fact, the pseudo-code 
should be readily understandable by anyone skilled in C or 
a similar language. However, I will describe the least 
obvious conventions next. 

Assignment to variables is indicated by a "Set var«value" 
statement. This is the equivalent of the C assignment opera- 
tion "var-value". 

A processing loop is indicated by a "For x=begin to end 
{loop -body}" statement. Here, loop -body is the code over 
which to loop, and x is an integer index variable whose value 
is typically referenced within the loop body. The loop is 
performed first with x equal to begin, and x is incremented 
by one before each successive iteration, until x is greater 
than end, at which point no further iterations occur. 

Conditional code execution is implemented with an "If 
boolvar {conditional-code}" statement. Here, the 
conditional-code statements are executed if and only if the 
expression represented by boolvar evaluates to TRUE 
(nonzero). Sometimes I use an English language expression 
for boolvar, where the evaluation method is apparent. Also, 
a corresponding "Else { }" clause may be used with an "If 
statement, as in C. 

An array is often indicated by notation such as 
array var[ ], or arrayvar[ JO]. Such arrays represent vectors, 
as they have exactly one dimension with no specified index. 
Similarly, arrayvar[ J ] would indicate an entire two- 
dimensional array, and arrayvar[2][3] would indicate just a 
single element of a two-dimensional array. Also, array index 
brackets will be dropped for clarity when the context makes 
the meaning clear. 

The operator "log" indicates a natural (base e) logarithm 
operation. The operator "exp" indicates a base e exponen- 
tiation operation. MIN(x, y) returns the minimum value of x 
and y. 
Transducer 

At the front end of the system is the transducer 24, which 
senses a physical pattern 22 and produces an input signal 26 
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representative of it. The physical pattern 22 may be virtually 
any object or event, or conglomeration of objects and/or 
events, that is observable. Similarly the transducer 24 may 
be anything capable of detecting such an observable. It may 

5 include, for example, photodetector cells, a microphone, a 
camera, sonar detectors, heat sensors, a real-time stock quote 
device, a global position device implanted in a blind per- 
son's cane, and so on. It may detect electronically stored 
patterns, for example a stored hypertext document on a 

10 remote network server. The transducer 24 may also include 
one or more humans, as for example when survey results are 
observed. Those skilled in the art of adaptive pattern rec- 
ognition will readily find many diverse physical pattern 
domains to which the present invention may be applied, as 

15 there are a great number of known methods and devices for 
sensing an extremely wide variety of patterns in the world. 

The transducer 24 is also assumed to handle any neces- 
sary "preprocessing" of the physical pattern 22. Preprocess- 
ing includes any known, hardwired transformations used to 

20 remove unwanted redundancy in the input, fill in missing 
values, and the like. These operations tend to be problem 
specific, and there are a great many possible ones. Some 
examples are: line extraction in character recognition; band- 
pass filtering of audio (e.g. speech) signals; and translation, 

25 rotation, and size normalization of images. It is important to 
note, though, that preprocessing is less important when 
using an adaptive feature-based device such as that of the 
present invention. While still useful, especially in well- 
understood domains, appropriate preprocessing can to some 

30 extent be "learned" by the adaptive part of the device. 
Because of this, in a worst-case scenario, wherein the system 
designer knows virtually nothing about the features con- 
tained in a physical pattern domain (and thus what prepro- 
cessing operations are appropriate), the present device can 

35 still be used without any preprocessing (i.e. with "raw" input 
data). 

Those skilled in the art of adaptive feature-based pattern 
recognition will be familiar with methods for producing a 
sequence of input signals 26 and presenting these to a digital 

40 computer based recognizer in the form of a sequence of 
vector values. Thus herein it is simply assumed that the input 
signal 26 is available as the variable INPUT[ ]. Note that 
transduction (including preprocessing) may be done off-line; 
that is, recognition and/or learning may be performed on an 

45 input signal 26 obtained from stored data, as long as 
transduction occurred at some time to produce the stored 
data. 

The variable INPUTf ] is assumed to be binary with 0/1 
values. If necessary, analog information may be converted to 

50 binary using the Albus Method (BYTE magazine, July 1 979, 
p. 61, James Albus) or another such known method. I believe 
there are straightforward extensions of my preferred 
embodiments which will work on analog inputs, but since I 
have not tested these, binary representations are preferred. 

55 Effector 

The last stage of the recognition process is handled by the 
effector 38. The effector 38 takes the output signal 36, in the 
form of the vector computer variable OUTPUT^ ], and 
produces an action 70 in the system's environment 20 which 

60 is (after learning) as appropriate as possible for the current 
input signal 26. As with transduction, this stage is well 
known in the prior art and thus will not be detailed here. 
Examples of effectors would be gears on a robot, traffic 
lights, a speaker, or a digital storage device. Combinations 

65 of different types of effectors might also be used. One use of 
a digital storage type effector would be to store an output 
signal 36 for future use. Such storage might, for example, 
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permit the invention to be used in implementing a database 
(perhaps of hypertext documents), wherein future queries 
would access the digitally stored outputs. In such an embodi- 
ment the effector 38 might, for example, store a copy of the 
input signal 26 along with an estimated class label obtained 5 
from the classifier 34 via the output signal 36. 
Notes on Experimentation 

A certain amount of experimentation is inherent in the 
optimal use of adaptive pattern recognition, just because the 
pattern domain is never completely understood (or else an 1Q 
adaptive system would not be required in the first place). 
Thus adaptive pattern recognizers are best viewed as tools 
for solving a problem, rather than the solution itself. 
However, with reasonable experimental techniques, the per- 
formance gap between a perfectly optimized recognizer and J5 
a practically optimized one can be made much smaller. 
Furthermore, even a minimally optimized recognizer 
architecture, once trained, can often outperform any existing 
solutions, making it extremely valuable despite being "non- 
optimal". .20 

In general, the experimental techniques appropriate for 
my preferred embodiments are the same as those known by 
those skilled in the art of adaptive pattern recognition. I will 
point out herein where any special considerations should be 
made with respect to my preferred embodiments. The Hand- 
book of Brain Theory and Neural Networks (Arbib, ed., MIT 
Press, Cambridge, Mass.) is a very comprehensive reference 
for techniques related to adaptive pattern recognition, and 
also contains numerous references to related prior art refer- 
ence material. The sections referring to backpropagation and 3Q 
unsupervised learning will be especially relevant, and will 
point to other relevant material. References such as these 
should be used to learn about appropriate experimental 
techniques, if not already known. 

Preferred Embodiment 1 35 
Architecture Figure and Flow Diagram 

My first preferred embodiment is described with reference 
to FIGS, 3 through 9. FIG. 3 illustrates the structure of the 
first preferred embodiment in more detail than in FIG. 1. The 
environment 20, transducer 24, and effector 38 are left out 40 
of FIG. 3 for clarity. FIG. 4 provides a flow chart illustrating 
an outline of the software implementation in more detail 
than in FIG. 2, and FIGS. 5 through 9 provide more detailed 
flow charts of the steps involved. 

Theory 45 

My first preferred embodiment makes use of a so-called 
"noisy-OR" neural network architecture. Radford M. Neal 
provides a good description of the theory of such networks, 
and provides further references ("Connectionist learning of 
belief networks", Artificial Intelligence 56, 1992, pp. 50 
71-113). These references should be used to provide any 
required background beyond that provided herein, except 
with respect to the learning procedure. My learning proce- 
dure is different from the one described by Neal. Another 
description of noisy-OR networks is provided by Jaakkola 55 
and Jordan ("Computing upper and lower bounds on like- 
lihood in intractable networks", in Proceedings of the 
Twelfth Conference on Uncertainty in At). 

A noisy-OR network uses binary (0/1 preferably) units, or 
neurons, which are activated according to a so-called noisy 60 
OR function. According to this method, there is a quantity 
pHD] f° r eacn P^r °f units i anc * h representing the 
probability that unit j "firing" (having value 1) will cause 
unit i to fire as well. (Neal works instead with the values 
q[ijj], where q[i][j]-l-p[i][j]*) There is also potentially a 65 
"bias" value for each unit, which is essentially a connection 
from a hypothetical unit which is always on (firing). My 



preferred embodiment uses bias weights only for the 
highest-level units, though. 

In my preferred embodiment the units are arranged in 
layers, with the bottom layer representing the input pattern 
(i.e., corresponding to the input signal 26). The goal in such 
a network is to learn an internal model of the pattern domain. 
This is also referred to as "unsupervised" learning. The 
internal model can also be viewed as a "pattern generator", 
and represents a probability distribution over the input 
pattern space. Ideally, a trained noisy-OR network could be 
used to randomly generate patterns whose distribution 
would very closely match the distribution of the training 
patterns. Because this type of network is most naturally 
viewed as a pattern generator, the connections will be said 
to feed from top to bottom in the network. However, in 
practice data flows both ways along the connections. 

Recognition could occur in at least two basic ways in such 
a network. First, the network could be given a partial input 
pattern and its internal model used to fill in the missing input 
values (assuming during training there were input patterns 
without these values missing). The missing values could 
represent class labels, in which case the network could be 
used for classification. Note that classification learning is 
often viewed as "supervised" learning, but nevertheless a 
so-called unsupervised procedure can still be made to per- 
form a similar task. 

A second way of doing recognition with such a network — 
which is my preferred way — is to use a separate class 
network 50[c] to model each class, as shown in FIG. 3. At 
recognition time, the input pattern is presented to each class 
network 50[c] 7 and each is used to produce a likelihood 
value representing the probability that that network would 
generate the input pattern. These likelihood values are 
computed by the classifier 34, which receives from the class 
networks 50 the feature activity signal 30 as well as other 
information needed to compute the likelihoods, such as the 
network weight values and activities of non-hidden units. 
The classifier 34 combines these likelihood values (via the 
well known Bayes Rule) with the prior class probabilities, to 
obtain the (relative) posterior class probability information. 
From this information it computes the index of the most 
probable class, which it communicates via the output signal 
36. Note that in this embodiment, any hidden unit in any 
network can be viewed as one of the feature detectors 28. 

The separate-networks approach has a drawback, which is 
that feature detectors cannot be shared by different classes. 
However, it circumvents a problem with the missing-value 
approach, which is how to force the network to leam 
features relevant for the classification task. 

During recognition, the input signal 26 is presented to 
each class network 50[c] in turn, and class likelihood values 
are computed as described, by the classifier 34. During 
learning, however, the input signal 26 is only presented to 
the class network 50[cTarget] corresponding to the (known) 
target class of the input pattern. Likewise, only the target 
network is trained on the current pattern. Since all the class 
networks 50 operate similarly, the network index will be 
suppressed herein for clarity where it is immaterial. 

Both recognition and learning require an inference pro- 
cess whereby the network 50[c] fits its internal model to the 
current input data (as represented by the current input signal 
26). Typical prior art implementations of noisy-OR networks 
use some sort of iterative inference process, in which 
multiple "activation cycles", or updates of unit activations, 
are performed. My preferred inference process is also of this 
type. Two results of this inference process are especially 
important. First, a likelihood value is produced which 
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(ideally) represents the probability that the network would 
produce the current input. This likelihood value is computed 
by the classifier 34 to facilitate classification. Second, the 
inference process produces a mapping from feature detectors 
28 to input parts. In this case, each non-input unit is a feature 
detector 28[m], and the "input part" corresponding to a unit 
is that part of the activation pattern in the layer below which 
the unit is judged to be "responsible" for producing (keeping 
in mind that we are viewing the network as a pattern 
generator). 

My preferred inference process is Gibbs sampling, a 
technique known in the prior art. It is a statistically based 
process, involving successive random sampling of the units' 
activity states (on or off). Each unit is visited in turn, and its 
activity is selected from a distribution conditional on the 
current states of all the other units in the network. If this 
process is run "long enough", the distribution of network 
states will come to mirror their likelihoods, i.e. their respec- 
tive probabilities, given the network's input data. An aver- 
age over several such states can thus provide an estimate of 
the overall likelihood of the network model. 

One of the virtues of this embodiment is that the inference 
process is iterative, and incorporates feedback. This effec- 
tively allows upper layers to influence how lower layer units 
are activated, when multiple hidden layers are used. Such 
top-down influences can lead to more flexible and accurate 
recognition overall. However, this extra power comes at a 
price: more processing time is required, relative to methods 
which are strictly feedforward. This extra processing can be 
minimized, though, by stopping the iterations when the 
changes they produce become less than some criterion. 

Gibbs sampling is also used for learning purposes. For 
each state of the network, a responsibility value r[i][j] can be 
computed for each pair of units i and j, representing the 
responsibility that unit j had for causing unit i to fire. Note 
that r[i][j] is not the same as the value p[i][j] mentioned 
above. The p[i][j] is a hypothetical probability; the prob- 
ability that unit i would fire if unit j were to fire. The value 
r[i][j], on the other hand, represents the effect unit j actually 
had on unit i, given a particular instantiated network state. 

The array of responsibility values for two connected 
layers of units constitutes a segmentation; it indicates which 
"parts" of the lower layer activities "go with" which units in 
the upper layer. Each unit in the upper layer is judged 
responsible for a certain part of the activity in the lower layer 
(unless it is inactive, in which case it is not responsible for 
anything). Viewing the upper layer units as feature 
generators, the responsibilities indicate what features were 
generated to create the lower layer activities, and which 
units generated which features. The units can also be viewed 
as feature detectors of course, and a unit's preference for a 
given feature is directly related to its probability of gener- 
ating that feature. Note also that units can share responsi- 
bility for a given "on" unit in the layer below. This will be 
referred to as "soft segmentation", as opposed to "hard 
segmentation" wherein only one unit is allowed to be 
responsible for an "on" unit. 

Learning occurs by moving the p[ ][j] values for unit j 
toward the corresponding r[ Jj] values for unit j. Put 
differently, we can view the vector of unit j's responsibilities 
as the "part" of the input it is responsible for, and its vector 
of outgoing weights (the p[ Jj] values) as its preferred 
feature. In those terms then, the learning procedure is to 
make unit j's preferred feature directly more similar to the 
input part to which it was assigned. The details of the 
method are given below. 

Notice that in this embodiment, the networks 50 not only 
contain the feature detectors 28 (that is, the "hidden units" — 
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possibly multiple layers of them), the networks 50 also are 
used to implement the memory 40. This is a beneficial 
outcome of using this type of unsupervised network: 
because the noisy-OR network used is designed to model its 

s input environment, it is essentially a (lossy) memory of the 
past inputs from that environment on which it has been 
trained. Furthermore, if there are multiple layers of units, 
these will also be operable as multiple memories of the 
appropriate types: each layer of units, and its connections to 

10 the layer below it, implements a memory for patterns of 
activity over that lower layer. FIG. 3 indicates the use of the 
networks 50 in implementing the memory 40 by dataflow 
arrows in both directions between the two subsystems. 
Implementation 

15 As mentioned, the core of the first preferred embodiment 
is implemented in software on a general-purpose digital 
computer. Thus a conceptual mapping exists between the 
structural subsystem description of FIG. 3 and the concrete 
implementation description. This mapping is as follows. 

20 The input signal 26 is implemented by the storage and 
subsequent retrieval from computer memory of a variable 
INPUT[ ] (note that the term "computer memory" should not 
be confused with the "memory 40", although the former is 
used in implementing the latter, of course). The feature 

25 detectors 28 include all the hidden units of the networks 50. 
The preferred features of the feature detectors 28 for a given 
network are stored in computer memory as an array variable 
WEIGHTY I ][ ]• Th e feature description signal 32 is 
implemented by storage and retrieval of appropriate ele- 

30 ments of the WEIGHT array from computer memory. The 
feature activity signal 30 is implemented with the storage 
and retrieval of an array ACT[ ][ ]. Implementation of the 
feature detectors 28 includes program code which computes 
the elements of ACT[ ][ ], using Gibbs sampling. The 

35 classifier 34 is implemented by program code which com- 
putes a value for the variable OUTPUT; this includes code 
for computing individual network likelihoods, which are 
combined to produce OUTPUT, Storage and retrieval of 
OUTPUT implements the output signal 36. 

40 The memory 40 includes the code which performs Gibbs 
sampling to elicit likely network (ACI^ ][ ]) states. Imple- 
mentation of the retrieval signal 68 includes storage and 
retrieval of PROBOFF variables for the network units. The 
memory 40 makes use of the feature activity signal 30 in 

45 computing the retrieval signal 68 (PROBOFF values). As 
will be elaborated below, the PROBOFF values for a given 
layer are a (particular kind of) combination of WEIGHT 
values from the feature detectors 28 in the layer above. Thus 
the memory 40 is a lossy memory, since the WEIGHT values 

50 cannot in general store an arbitrary number of patterns 
without some loss of information. 

The assigner 66 is implemented with code that computes 
responsibilities for the network connections. This code is 
part of the weight updating code of FIG. 9, which computes 

55 the responsibilities implicitly as described below. The part 
mapping signal 44 is implemented with temporary storage 
within the weight updating code of FIG. 9. This code block 
also implements the updater 42. The target signal 46 is 
implemented by a variable called TARGET, indicating the 

60 target class of the current physical pattern 22. 
Architecture and Parameter Selection 

Certain aspects of the system architecture will be deter- 
mined by the problem to be solved. The number of networks, 
C, will be equal to the number of classes to be recognized 

65 (e.g., in recognizing lower case English letters, C would be 
26). The number of input units (those in the bottom layer) 
will normally be the same for each of the networks, and will 
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be determined by the input representation chosen. Recall 
that this representation is preferred to be 0/1 binary, but 
other than that, its composition is left to the designer. The 
creation of an appropriate input representation is a common 
task in the prior art pattern recognition literature. 

Other aspects of the architecture will require educated 
guesses and possibly experimentation to achieve optimal 
performance. Again, this is characteristic of prior art devices 
as well. For example, the number of layers of units in each 
network could be varied. My experiments have only been 
performed with two- and three-layer networks (i.e., one and 
two layers of connections), but I have no reason to believe 
good results would not be obtained with more layers than 
this. Indeed, 1 believe more layers would be beneficial for 
many problems where the input domain contains large 
amounts of redundancy, such as raw images. The combina- 
tion of this with the use of limited receptive fields (a 
technique now well known in the neural network literature) 
will probably be especially useful. As a general rule, the 
harder it is (for a person) to describe a class in terms of the 
input features, the more helpful it may be to have additional 
layers. However, a two -layer network is still preferred, with 
more layers added only if experimental resources allow. This 
also simplifies interpretation of the nomenclature herein: the 
activations of the single hidden layer are represented by the 
feature activity signal 30 — that is, the hidden units corre- 
spond to the feature detectors 28 — and the input units 
receive the elements of the input signal 26. The description 
herein considers the number of layers a variable, though 
(NUMLAYERS), to make experimentation with additional 
layers straightforward. 

The numbers of units in each (non-input) layer are also 
parameters of the embodiment, as with prior art neural 
networks. The first number tried should be the best guess at 
the number of independent features in the input domain 
("input domain" here means the activities of the next lower 
layer of units). A typical method of experimentation is to 
start with a very small number of units in each hidden layer, 
and increase the number after each training run, as long as 
performance of a trained system (on a cross-validation data 
set) improves and experimentation time allows. It is also 
typical to find better overall performance when the number 
of units decreases from a given lower layer to an upper layer. 
This is because one job of the unsupervised network is to 
remove redundancy, and fewer units are required to repre- 
sent the same information, as more redundancy is removed. 

Because my preferred embodiment is strictly layered, 
with no connections which "skip" a layer, it is convenient to 
view the weight values for a given network as a three- 
dimensional matrix, where the first index corresponds to the 
layer number, the second to the receiving (lower layer) unit, 
and the third to the sending (upper layer) unit. Thus I will 
use the variable WEIGHT{LAY][i][j] to represent the weight 
value from unit j in layer LAY+1 to unit i in layer LAY 
(where the layers are indexed starting with 0 for the input 
layer). 

Regulation of Trials (Pattern Presentations) 

As shown in FIG. 4, the overall course of learning and 
recognition is divided into trials, each trial involving the 
presentation of a single input pattern. Generally speaking, 
the user and the problem being addressed will determine on 
which trials learning and/or recognition will be enabled. 
Obviously, recognition will not be very good to the extent 
that learning has not occurred. Preferably though, classifi- 
cation error should be evaluated throughout learning on a 
separate cross-validation data set, and learning (which is 
done on a training data set, not the cross-validation set) 
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terminated when the cross-validation error bottoms out and 
begins to increase. This technique is well known in the art. 
Other techniques may also be useful, however. For example, 
learning might be enabled throughout the lifetime of the 
device, perhaps to allow continual adaptation to a nonsta- 
tionary environment (in such a case, it would be inappro- 
priate to decrease the learning rate over time, though — see 
below). 

Training patterns should be chosen independently and at 
random according to the input distribution to be learned. 
Note that the memory 40, which is implemented using the 
feature detectors 28 in this embodiment, does not contain 
any patterns at first (although the initial random weights 
could be viewed as representing hypothetical stored 
patterns). After one or more training trials, however, it is 
considered to have approximately stored the trained pat- 
terns. These stored patterns thus become the comparison 
patterns for future training trials, and are used in finding 
likely features or parts within each future input signal 26. 

Before any learning, all weights (p[i][j] values) of all C 
networks should be initialized to small random values. 
These are stored in the array elements WEIGHT[LAY][i][j]. 
Preferably these should be uniform random in the range 0.02 
to 0.04, but this could be an experimental parameter if 
resources allow such experimentation. During learning, the 
weights are preferably maintained in the range 0.01 to 0.99 
(by resetting any weight that is beyond a limit back to that 
limit, after normal weight updating). The purpose of this is 
to prevent learning becoming "stuck" due to extremely low 
likelihoods, and to help prevent computed probabilities from 
exceeding machine-rep resentable values. However, if 
experimentation is possible, and it is known or believed that 
important features in the input domain may occur with 
probabilities beyond this range, then these limits should be 
adjusted to compensate. 

There are two variables for each unit i, COUNTft] and 
COUNTBIAS[i], which are used to count training trials, as 
further explained below. These must be initialized to zero 
before any training trials are performed. 

On each trial for which training is enabled, a variable 
TARGET is set to the index of the target class for the current 
physical pattern 22. A test will be performed during the loop 
over networks (as shown in FIG. 4) to determine if the 
current class, c, is equal to TARGET. If so, processing, 
including training, will continue on network c. 
Regulation of Cycles (Gibbs Sampling Iterations) 

As shown in FIG. 4, an important part of each trial is a 
loop over "cycles". This is done separately for each enabled 
network (just the target class network, if only training is 
enabled, and all class networks if recognition is enabled). 
The process is the same for each network, though, so here it 
will be discussed in the context of a single network. 

Each cycle includes a single Gibbs sampling of each unit 
in the network, as well as a computation of the likelihood of 
the activation state produced. Also, two variables for each 
unit, PROBOFF and NETOFFBELOW, which are described 
below, are updated on each cycle. If training mode is 
enabled, weights are also updated on each cycle. 

The reader is referred to the prior art literature to review 
the theory behind Gibbs sampling. The basic idea, though, is 
that each unit's activation is periodically sampled according 
to its probability conditional on the current activations of all 
the other units. Eventually, using this procedure, each over- 
all network state will occur approximately with a frequency 
according to its overall probability (given the instantiated 
network input). This is a useful property because it is often 
very difficult to direcdy compute the probability of a net- 
work state conditional on a given input. 
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Each time a unit's activation is sampled, two values must hidden (non-input) units. For each non-input layer, all the 

be computed: the probability of the entire network (given all unit activations are initialized to zero, 

current values of the other units) if the unit were to have Also before cycling, a random sampling order is chosen 

activation 0, and the probability of the network if the unit for each layer of the network This is simply a (uniform) 

had activation 1. The probability that Gibbs sampling will 5 random permutation of the unit indices of the layer, which 

assign the unit an activation of 0 is just the first of these two is used as the order to perform Gibbs sampling within the 

values, divided by their sum. If a 0 activation isn't assigned, layer. These indices are stored as the variable ORDER 

then the unit takes on a value of 1. [LAYJ ], where LAY is the layer index, and the other index 

A theoretically equivalent way of doing the same thing is ranges over the units in that layer. Note that it may work well 

to compute the probability that a unit's activatioa should 10 to use a different random order on each cycle, but my tests 

change. This is the method of my preferred embodiment. It have not done this, and it requires somewhat more time, so 

turns out with the noisy-OR architecture that a given unit it is not the preferred method. 

will only be affected by a certain group of other units. Id The PR OB OFF values for the units are initialized as 

particular, a unit's parents, and children, and "siblings" follows. For the top layer in the network (layer number 

(other parents of its children) are the only ones which need 15 NUMLAYERS-1, where the bottom layer is layer zero), the 

be considered when sampling a unit's activation. PROBOFF value for each unit is just 1.0 minus the bias 

My preferred embodiment employs a strategy which is a weight for the unit; that is, PROBOFF{NUMLAYERS-l] 

further improvement over a straightforward implementation [i]^1.0-\VTIGHTi;NUMLAYERS-lIi][0]. (Note the bias 

of Gibbs sampling. This strategy takes advantage of the fact unit is considered to be a fictitious unit 0 in layer 

that many computed values do not change from one cycle to 20 NUMLAYERS.) For each non-top layer, all of its units' 

the next, especially on later cycles. Thus, an "update" PROBOFF values are initialized to 1.0, reflecting the fact 

strategy is employed, whereby certain useful quantities are that all non-input units are initially off. 

maintained from cycle to cycle, and updated whenever other The NETOFFBELOW variable for each unit is initialized 

changes in the network state require. These updates typically as follows. For units with no children (input units), NET- 

consume less processing time overall than would recomput- 25 OFFBELOW is set to zero (and is always zero). For each 

ing the values on each cycle. other unit j in a non-input layer LAY, NETOFFBELOW is 

Two main variables are maintained for each unit, desig- the sum over inactive child units i of -log(1.0-WEIGHT 

nated herein as PROBOFF and NETOFFBELOW. The [LAY-lIi]U]). (Note this is the natural logarithm, i.e. base 

PROBOFF value for a unit represents the probability that a e.) Note that since all but the input units start out with 

unit is off given its parents — a quantity which is very useful 30 activations of zero, all but the units in layer 1 (parents of the 

in computing a unit's probability conditional on the rest of input units) will compute this sum over all their children, 

the network. Since computation of PROBOFF involves a The UNITPROB variable for each unit is initialized to 

product over the unit's "on" parents, it only needs to be one, for all units. This variable will be used to accumulate 

updated when the activation of a parent unit changes, or the the (product of the) units' individual contributions to the 

connection weight from a parent changes. Furthermore, the 35 overall network likelihood, which is computed over the 

update need only deal with the changed parent activation, course of all the cycles performed, 

rather than iterating over all parents again. Two other variables are used for each unit as well: 

Whereas PROBOFF may be considered a contribution COUNT and COUNTBIAS. These are used to keep track of 

from a unit's parents to its activation probability, the NET- the number of training cycles for which the unit has been 

OFFBELOW value for a unit stores the contribution from 40 active (for COUNT) or has been either active or inactive (for 

"off" child units. It only needs to be changed when a child's COUNTBIAS). These variables are used in training, to 

activation changes, or a connection weight to a child reduce the amount of feature modification which is done 

changes. This value is very useful because in computing a over time, thus helping the training process to converge, 

unit's probability, the contribution from all "off" child units Gibbs Sampling and Unit Variable Updating 

is computed by simply summing NETOFFBELOW with 45 FIG. 6 (along with FIG. 7) illustrates the Gibbs sampling 

contributions from other units. Furthermore, NETOFFBE- process for a single cycle, in more detail. The overall 

LOW is itself just a sum of (negative) logs of the appropriate structure is two nested loops, the outer loop iterating over 

l-p[i][j] values; i.e., it requires no multiplications or divides layers in the network (bottom to top), and the inner one 

to compute (a table lookup could be used to speed up the log iterating over units within each layer. The majority of the 

operation, and/or each connection's -log(l-p[i][j]) value 50 processing, as described next, occurs for a particular unit, 

could simply be stored). The overall implication of this is the index of which is chosen from the permuted index list 

that the contribution from "off" children is very fast to ORDER[LAY][u]. 

compute. Moreover, in many applications the ratio of "off" The process for sampling a value for the current unit's 

to "on" units may be considerably higher than 1.0. To the activation, stored in ACT[LAY][i], is illustrated in FIG. 7. 

extent this is true, the overall time to perform Gibbs sam- 55 Note that, as illustrated in FIG. 6, no sampling is done on 

pling can be much faster with my method. units with clamped activations. In my preferred embodiment 

Initialization Before Gibbs Cycles all the input units' activations are clamped, and no other 

Before any cycles occur, certain variables are initialized, units are clamped. However, a distinction between 

as shown in FIG. 5. The activations of the input layer, "clamped" and "input" units is made here to assist those 

represented by the variables ACIX0][0] . . . ACIt0][N-l], 60 skilled in the art who may wish to experiment with alter- 

are set to be equal to the input pattern as stored in the array native embodiments in which this is not true. 

INPUT[0] . . . INPUTtN-l]. These values will be "clamped" The strategy behind the procedure in FIG. 7 is to compute 

during Gibbs sampling, meaning they are unchanged (not the probability that the current unit should change its 

sampled). However, in other embodiments, in which some activation, based on all the other units' current activations, 

input values are missing, it would be appropriate to allow the 65 The variable NET is used to accumulate "evidence" for the 

missing values to be "filled in" by Gibbs sampling, by necessity of an activation change. NET is used as input to a 

treating the corresponding input units like the network's sigmoid function, which outputs the probability of change. 
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This probability value is compared to a random real value NETOFFBELOW[LAY+l][j], the quantity -log(l- 

between 0 and 1 inclusive to determine if a change is WEIGHT[LAY][iJj]). Whether an addition or a subtraction 

actually made. is performed depends on whether unit i is now off or on. 

NET is initialized using the contribution from the current After performing Gibbs sampling for all the units, another 

unit's parent units. Most of the computation which goes into 5 double loop is performed as shown in FIG. 8. Each unit in 

this has already been done, in the ongoing updates of the the network is again visited in turn (random index selection 

PROBOFF value for the current unit. The contribution is, is not necessary here), and the UNITPROB value for each 

theoretically, the (natural) log of (1 -PROBOFF), minus the unit is updated. UNITPROB is ultimately used to estimate 

log of PROBOFF. This assumes that the current activation is the likelihood of the entire network model, based on the 

zero; if it is not, the contribution must be multiplied by -1 . 10 current input pattern. This likelihood is the product of the 

Notice that an adjustment is made to the theoretical value, individual unit probabilities (the probability each has its 

though. In particular, instead of using PROBOFF directly, current activation, given the input). Furthermore, this quan- 

we use the smaller of PROBOFF and a constant, 0.99. This tity should be computed over a reasonably large sample of 

is done for the same reasons that the weights are clipped: to Gibbs cycles. My preferred method is to compute it over all 

prevent problems of machine representation of small 15 the cycles (which is 20, preferably). Thus on each cycle, 

numbers, and to keep the Gibbs sampling from getting each unit's UNITPROB is simply multiplied by its 

"stuck" due to extreme probabilities. Once again, though, if PROBOFF value, if it is off, or 1 minus its PROBOFF value, 

there is a reason to believe that this value 0.99 is too if it is on (the UNITPROBs were initialized with value 1), 

restrictive given the problem at hand, then experimentation as shown in FIG. 8. 

should be done with a less restrictive (larger) value. 20 In practice, it may be preferable to do the UNITPROB 

The second contribution to NET comes from the child computation in the log-probability domain, however, since 

units which are "off". Again, this has essentially already multiplying many probability values together can create 

been computed via our scheme of updating, this time in the numbers which are too small to represent on some comput- 

variable NETOFFBELOW. In particular, NETOFFBELOW ers. In this case UNITPROB would be a sum of logs 

is subtracted from NET. This assumes again that the current 25 (initialized to zero) and the update would be to add log 

unit is "off". If it is not, NETOFFBELOW should be added (PROBOFF) if the unit is off, and add log(l -PROBOFF) if 

to NET, this is done by the following conditional, as shown the unit is on. While this procedure may avoid representation 

in FIG. 7. problems, it will also require more computation unless a log 

The contribution from "on" child units cannot be easily lookup table is used, 

computed from running variables, as can the other contri- 30 To the extent experimentation is possible, it may also be 

butions; it must be recomputed each time by iterating over useful to try computing UNITPROB values just over the 

all the (on) child units. This is done next in FIG. 7. For each later cycles, for example just the last half of the cycles. This 

"on" child unit, we must compute the probability of that could potentially produce a more accurate estimate of the 

child having its current value, under two scenarios: (1) the true network likelihood, because the Gibbs sampling will" 

current unit's activation changes, and (2) it doesn't change. 35 have had more time to settle toward the true distribution. 

In fact, it is the log of the ratio of these two probabilities However, there is a tradeoff when the total number of cycles 

which is added to NET, for each "on" child. This basic is limited (as it must be in practice), because reducing the 

procedure is somewhat complicated by some enclosing number of cycles used to do the estimation also reduces the 

conditionals. The purpose of these conditionals is just to quality of the estimate. Experimentation is the only way to 

handle the abnormal cases wherein one or the other, or both, 40 find an optimal tradeoff; however, I believe my method will 

of the probabilities is zero. produce good estimates in general. 

As shown in FIG. 6, once an activation has been selected Feature Modification 

for the current unit (ACItLAYJi]), a check is done to see After Gibbs sampling of each unit, and updating of 

whether the activation changed (the previous value must appropriate running variables, feature modification 

have been stored, of course). If it has changed, then the 45 (learning) occurs for the cycle, as shown in FIG. 9. Of 

running variables PROBOFF and NETOFFBELOW must be course, this is assuming that training mode is enabled; if the 

updated for all other units which might be affected. system were in recognition-only mode, no feature modifi- 

The PROBOFF value for a unit keeps track of its prob- cation would take place, 

ability of being off, given its parents. Thus any units in layer The first step shown in FIG. 9 is to set a learning rate 

LAY-1 need to have their PROBOFF value updated (of 50 variable, LRATE, to 1.0. Since LRATE gets multiplied by 

course if LAY is the input layer, there will be no such units). each potential weight change, using a value of 1.0 is 

For each child unit k, this is done by either multiplying or equivalent to not using a learning rate at all. However, one 

dividing PROBOFF[LAY-l Jk] by the probability that unit is used here because certain modifications of the preferred 

i would not turn unit k on — that is, by the quantity embodiment might require one, so it is useful to illustrate 

l-WEIGHT[LAY-l][k][i]. Whether a multiplication or a 55 how LRATE should be used in the more genera] case, 

division is performed depends on whether unit i is now off As with Gibbs sampling and the updating of UNITPROB 

or is now on. Notice that topmost layer units will never have values, learning is done within a nested double-loop, over 

their PROBOFF values changed during cycling, because layers and units within each layer. The units are visited in 

their only parent is the (hypothetical) bias unit, which has a turn, rather than according to a random index order, in my 

constant activation of 1. 60 preferred embodiment. However, if experimentation is 

The NETOFFBELOW value for a unit i keeps track of the possible, I advise trying a modified embodiment in which 

contribution to its probability from its "off" children. Thus the units within a layer are visited in a different random 

any units in layer LAY+1 need to have their NETOFFBE- order on each cycle. This is because PROBOFF values for 

LOW value updated, since unit i in layer LAY has now the layer below are modified during training of a unit, and 

changed its activation (of course if LAY is the topmost layer, 65 this affects future training of other units in the layer. Thus, 

their will be no such units). For each parent unit j, this is in my embodiment there is a bias according to a unit's index, 

done by either adding to or subtracting from the variable While I don't believe it would make a significant improve - 
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ment to remove this bias with random indexes, it is possible 
that it could for some recognition tasks. 

As each unit is visited, 0.05 is first added to its COUNT- 
BIAS value. This variable keeps track of the number of trials 
of learning which the unit has "experienced" so far. The 
value is 0.05 because 20 cycles are used in my preferred 
embodiment, and 0.05=V£o. A similar variable, COUNT, 
keeps track of just the number of training trials for which the 
unit was active. COUNT is updated within the conditional 
described next. 

The weights leaving a given unit i (those to the layer 
below) are only modified if unit i is active. If it is, its 
COUNT variable is updated as just mentioned, and then a 
loop is entered to iterate over child units of i. 

For each child k of unit i, we can compute an associated 
"responsibility'* value, representing the responsibility unit i 
had in causing k to be active. If k is not active, this 
responsibility is zero. Otherwise, the responsibility is deter- 
mined by dividing WEIGH1tLAY-l][k][i] by the quantity 
l-PROBOFF[LAY-l][k]. This is essentially the prior prob- 
ability that unit i would turn k on (WEIGH1tLAY-l][k][i]), 
divided by the prior probability that k would be on given all 
its parents' current activations. Note that we say "prior" 
here, because these probabilities do not take into account 
whether or not k is actually on as a result of Gibbs sampling. 

The array of responsibilities for all of unit r*s children 
constitutes the "part" of the pattern of activity in the child 
layer which has been assigned to unit i. The goal of learning 
is to move unit i's preferred feature — namely, its vector of 
weights going to its children — toward its assigned part. Thus 
we can view the vector of i's responsibilities as the "target" 
toward which we want its weights to be modified. 

The upshot of this, in terms of an actual procedure for 
each weight, is that WEIGHItLAY-l][k][i] should be 
moved toward zero, if unit k is not active on this Gibbs 
cycle, and towards WEIGHTtLAY-l][kIi]/(l-PROBOFF 
[LAY-l][k]) otherwise. (Remember that no changes are 
made at all unless unit i is active.) This is what the procedure 
of FIG. 9 does, although it does not explicitly compute the 
responsibility (target) value. Furthermore, the actual amount 
of the change is determined by LRATE and the COUNT 
value for unit L 

I believe the procedure of reducing the effective learning 
rate (i.e., LRATE/COUNT) using COUNT is the best way to 
achieve a balance of fast learning and convergence toward 
a stable solution. However, there are two related situations 
where this would not be so appropriate, and thus these 
situations are not preferred applications of my preferred 
embodiment. Hie first situation is where input patterns for 
the recognition system are not chosen independently and at 
random. The second situation is where the patterns are 
chosen at random, but the distribution changes over time (is 
"nonstationary"). In either of these cases, there could be an 
unwanted "primacy effect" due to the fact that more training 
is done on earlier patterns than on later ones. Although I do 
not recommend applying my preferred embodiment to such 
cases, if it were to be attempted, 1 believe the most appro- 
priate approach would be to use a constant LRATE of 
considerably less than 1 .0, and to not divide by COUNT. 

After a weight is updated, it is then clipped to lie in the 
range 0.01 to 0.99, as discussed previously. Also, the 
PROBOFF and NETOFFBELOW values which depend on 
the just-modified weight are updated as appropriate. Note 
that while this might seem to incur a lot of computation, 
since there are so many weights, and a multiply and a divide 
are required each time, the situation is not as bad as it first 
appears. This is because learning only takes place for 
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weights from an active unit, and furthermore in many 
applications there will be fewer active units than inactive 
ones. 

Once a unit's outgoing weights have been modified (or 

S not, if it was not active), a test is made to decide if its bias 
weight should be modified. Only topmost units even use a 
bias in my preferred embodiment, so that is one condition of 
the test. Also, bias weights are only updated on the last of the 
(20) cycles, For the most part, updating a bias is the same as 
updating any other weight. There is no need to test if the 
parent unit is active, though, because the bias unit is always 
active, albeit in a hypothetical sense. 

Another exception is that my preferred embodiment main- 
tains bias weights in the range 0.01 to 0.25. This is because 
my experiments showed that allowing a bias to grow too 

15 large could allow it to "dominate" the others: it would grow 
large, and would thereby take responsibility for nearly all 
inputs, thus creating a vicious circle in which other units 
could never "win" any inputs. However, as with the range 
limitation of the other weights, if there is reason to believe 

20 that the top-level "true" features in the pattern domain can 
occur with probability greater than 0.25, experimentation 
with an appropriately increased maximum should be done 
insofar as possible. 
When a bias is changed, the PROBOFF value for the unit 

25 must also be updated. Since only topmost units have biases, 
though, and they have no other incoming connections other 
than biases, this is a simple update procedure, as FIG. 9 
indicates. 

Exiting the Cycle Loop 

30 As shown in FIG. 4, once a cycle of Gibbs sampling has 
been done, along with the corresponding updates of weights 
and other variables (such as PROBOFFs), a check is made 
to decide whether to exit the cycle loop. In my preferred 
embodiment, as mentioned above, the loop is exited after 20 

35 cycles have been performed. Another possible embodiment, 
though, would be to exit the cycle loop as soon as the 
amount of changes to the unit activations has become small, 
according to some measure. For example, the loop might be 
quit after two complete cycles had failed to produce any 

40 activation changes, or after 5 cycles wherein less than 2 
percent of the unit activations changed. Obviously there are 
an unlimited number of similar strategies. 

Such an alternative embodiment would have the advan- 
tage that when one interpretation of the input (i.e., one set of 

45 network activations) is much more likely than the rest, very 
little cycling would be required. This could often be the case 
once extensive training has already been done. However, 
one would need to deal with the issue of how much training 
to do on each cycle, given that a different number of cycles 

50 would be done on different patterns (this could be especially 
tricky for bias weights). Also, some maximum number of 
cycles would still need to be set. These added complications 
are the main reasons that I do not prefer such an embodi- 
ment. 

55 System Output Determination 

Compute a Relative Probability for the Network 

As mentioned above, the Gibbs cycling process 
(including weight and variable updating) is the same for 
each network in the recognition system. This is also true of 

60 computation of a network probability value, which takes 
place after the cycling, unless only training mode is active, 
in which case the probability value is unnecessary. The 
probability of network c is stored in the variable 
NETWORKPROB[c] once computed, which will be used in 

65 computing an output for the overall recognition system. 
Computation of NETWORKPROB values is easy, given 
that we have already computed UNITPROB values for all 
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the units in the network. NETWORKPROB[c] is just the patterns, stored in memory; this is the approach used by my 

product over all layers LAY and units i in network c of second embodiment. The heuristic underlying this strategy 

UNITPROB[LAY][i]. (Of course, if one were using the is the following: A feature could be defined as that which 

modified method described above of using log probabilities distinguishes two similar, but nonidentical, patterns— i.e., 

for UNITPROB, then NETWORKPROB[c] would be a sum s the "difference" between them. Therefore, a reasonable way 

of the UNITPROB values instead.) Tlie NETWORKPROB to find likely fealurcs in a paltcrn is t0 ^ it to storcd 

[c] variable represents the probability of the network c atterns which are similar bm not iderjtical , and to compute 

model (as embodied by its architecture and modifiab e &ome SOft of diffcrcnce for cach STlch prison. Having 

weights, and as estimated here by a sampling of probable doQe tfai ^ ^ features _ wh ich are "parts", in the 

activation states) AND the input signal 26 (again, viewing „„ t • < ■ *■ -a *• u j. 

, , r . & . . \ \ ' 1 . T ™r io terminology of this invention specification — can be used to 

the network as a generator of input signals). The NET- ^ ^ ^ f detect o rs 

WORKPROB values can thus be compared to see which is ~, „ B , , j * j 

u ui c il- i f • 1 <yc The overall approach of the second preferred 

more probable for this particular input signal 26. , . :\ u f11 . ™ y , n . , 

Settin OUTPUT embodiment, then, is the following. The memory 40 is used 

6 a Qg u • i7T/- a n *. i • *i_ u t0 store > m a lossless manner, a large number of "compari- 

As shown in FIG. 4, once all networks m the system have _ „ ' ^ ... , , . 6 , , , * . 

. At •«*.-• u i. a a son patterns from the input domain to be learned. Each new 

been p ocsssed-for recognition training, or both depend- * wWch fa ^ j/^^ to one or more ri . 

mg on the system mode — the network loop is exited. If only 4 . r . mrTT , . A r 

... % . , , , . . r i * * son pattern, and a difference vector DIFFT 1 is generated for 

training mode is enabled, processing is now complete for , • ^ i_ j-^ 

...... i « tj j • each comparison. The assigner 66 compares each difference 

this input signal 26. However, if recognition mode is . / a * *u c j <■ c i_ c ^ j , . 

U1 . .. . , . . i j * • j vector (part) to the preferred feature of each feature detector 

enabled, the system output must be determined, ™r i • * j i_ * t * *. ^ 

t, ' . . • . i lL ... A „™ TTflT i 20 28[ml, as communicated by the feature description signal 

lne system output is stored as the vanable OUTPUTT J, iifi tu j * * ior i u- u u * *u .u j-<r 

, . . , tl _ r .j f . , , . L J ' 321 m J. lne detector 281 ml which best matches the difference 

and is simply the index of the network which is most , (( - „ . t , ... - c 

... . it _ A . A . . A , vector wins that difference vector, and this information is 

probable, given the current input signal 26. (Note that . , , 4 , , « . ' , . . . . 

mrmTrTn • • i i * l \ • • j • communicated to the updater 42 via the part mapping signal 

OUTPUTT ] is a single-element array here.) This index is ^ , . „ r . . * * * , c * 

j J ■ * i_ ir * io . "i j 44. The updater 42 moves the winning detector s preferred 

then used as appropriate by the effector 38, as described „, , . . r * . ? . , r . 

previously. If, as is preferable, the classes in the pattern 25 J^'™ 3 " 1 ^ <3lfferenCe VCCt ° r ^ U W ° D ' by S ° me 

domain are equally probable a priori, OUTPUT is just the A r ' «- • * t c i. * * • ^ r * 

• ,1 * *u *. i *u i . xTT^™7^nv' T iTi^T 1 ^t 61 " a sufficient amount of such trammg, the feature 

index of the network with the largest NETWORKPROB , ( . 1fl , . , . & ; 

va ^ ue & detectors 28 are used as an input layer to a backpropagation 

Ti f . ' iL .... i • ^ based neural network, which plays the role of the classifier 

It is often the case, though, that classes in the pattern * A ™, . . , , _ , . , u r - J . .... . n A , u 

, . . ..rr . . . y . • K 30 34. This is done by making the feature activity signal 30 the 

domain nave different prior probabilities. In this case, an . t , , / & ! . . . . ^ t 

i 1 j i_ j c.. . . *i .j • ,« . input to the backprop network, and having each of the 

estimate should be made of these probabilities ( priors ), c r ^ , . 4 i„ ,f . .......... 

. , j ■ „ T Apcnnnnn t-u c i_ i teature detectors 28 become active to the extent that it s 

and stored in the array CLASSPROB[l Then, for each class * j r • * j • u * - . - i 

MrnimnvnnAnr i l i j i_ i.- i- j i. preferred feature is found in a subsequent new input signal 

c, NETWORKPROBfc] should be multiplied by \, 0 , • j * • • • .u j *i. 

^TAoer«r»/^Tiri • • t. *t- i. . j ■ 26. Conventional supervised training is then done on the 

CLASSPROB[cl, with the result stored in . , , ■ . . , c . , . . 

rVTCT\i7r\DT^tiDrvT»r t / ii At *u t tvttt 35 backprop network usmg these pretramed feature detectors 

NETWORKPROB[c] (unless logs were used for the UNIT- *„ « r . ... . ... 

nnAn jxTTTOTXnVnnnTi • , . , ... , 28. lne result is a pattern recognition system which requires 

PROBs and NETWORKPROBs, in which case the log of the f , * , , 6 , , \ , \ 

or ACCDD « n , 1JU .j j . . . xtt~ w t 1 fewer resources due to the learned data compression input 

CLASSPROBs should be added to the corresponding NET- , r, . . . . c . r , . . \ a 

^irnn^nnrm i \ -n. xTT-™^r»^Tir. r^r. i layer. Furthermore, since the trained feature detectors 28 

WORKPROB values). The NETWORKPROB values can 3 , . , . r . . w , 

... < . f* .o.i • t i *v represent valuable information about the pattern domain, 

then be compared in the same way as if the prior probabih- „ ■ * , c *. u • j *l • 

ties were e ual preferred features may be copied or otherwise trans- 

„ *u ■* _n . . i . i ferred to a comparable recognition system in order to avoid 

Once the appropriate action 70 is taken based on trai . ^ ^-3^^ SYS te m 

OUTPUTT ], processing of the current input signal 26 is „ & . L j« '-, £ , j . . -10 

1 * tl - 11 \ 1 . . . Because the memory 40 and feature detectors 28 are 

complete. The next step is to (potentially ) select a new input . . ... ... , iL 

. 1 ~ c a * *L * r 4. - 1 r separate m this embodiment, the segmentation into parts 

signal 26, and repeat the processing for a trial (see Regu- Jf /r f , f 4 N . . . , ' . , r 

. , „ r ,. ,f v to v & 45 (likely teatures) is likely to be not as good overall as that of 

lation of trials section above). 1. j- ./^ n . r j 

7 embodiments (such as my first preferred one) which tightly 

Preferred Embodiment 2 integrate these two subsystems. Also, data compression is a 

Architecture Figure and Flow Diagram lossy procedure, and as with other unsupervised procedures, 

My second preferred embodiment is described with ref- there is no inherent way of forcing the learning of features 

erence to FIGS. 10 through 12. FIG. 10 illustrates the 50 which are relevant for the classification task at hand. For 

structure of the second preferred embodiment in more detail these reasons, this embodiment especially should be used as 

than in FIG. 1. FIG. 11 provides a flow chart illustrating an a tool, and only when experimentation is possible — not as a 

outline of the software implementation in more detail than in "quick-fix" solution for a mission-critical task. Of course, 

FIG. 2, and FIG. 12 provides a more detailed flow chart of this is true to some extent of all adaptive pattern recognizers, 

the steps involved in training the feature detectors 28 . 55 including my first preferred embodiment, to the extent that 

Theory the pattern domain is not well-understood. 

My second preferred embodiment is different in many While this embodiment lacks a tight integration of feature 

respects from the first preferred embodiment, and thus detectors 28 and memory 40, it is also somewhat more 

indicates to some extent the range of useful embodiments simple to implement than embodiments such as my first 

made possible by the invention. It uses independent feature 60 preferred one. Furthermore, it allows a very wide variety of 

learning to create a data compression device, which is then backprop networks to be used as the classifier 34, which 

used as the front end to a (well-known) backpropagation makes it a very flexible and powerful pattern recognition 

network. tool. 

One way to make an intelligent guess as to what features Implementation 

are contained in a pattern is to use the existing feature 65 As mentioned, the core of the second preferred embodi- 

detectors to segment it; this is the technique used by my first ment is implemented in software on a general-purpose 

embodiment. Another way, though, is to use actual previous digital computer. Thus a conceptual mapping exists between 
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the structural subsystem description of FIG. 10 and the 
concrete implementation description. This mapping is as 
follows. 

The input signal 26 is implemented by the storage and 
subsequent retrieval from computer memory of a variable 
INPUTt ] (note that the term "computer memory" should not 
be confused with the "memory 40", although the former is 
used in implementing the latter, of course). The preferred 
features of the feature detectors 28 are stored in computer 
memory as an array variable WEIGHT[ ][ I ]. The feature 
description signal 32 is implemented by storage and retrieval 
of appropriate elements of the WEIGHT array from com- 
puter memory. The feature activity signal 30 is implemented 
with the storage and retrieval of an array ACT[ ]. Imple- 
mentation of the feature detectors 28 includes program code 15 
which computes the value of ACI£ ]. The classifier 34 is 
implemented by program code which provides the function- 
ality of a (conventional) backpropagation network. This 
backprop program code computes a value for the variable 
OUTPUT Storage and retrieval of OUTPUT implements the 20 
output signal 36. 

The memory 40 includes computer storage and program 
code for storing, in their entirety, a sequence of training 
patterns, each represented by a distinct input signal 26. 
Implementation of the retrieval signal 68 includes storage 25 
and retrieval of COMPARERAT values, each of which 
represents a "comparison" pattern, and is one of the training 
patterns. The assigner 66 implementation includes code that 
computes a difference between a current training pattern, 
TRAINPAT, and a current comparison pattern, COMPARE- 30 
PAT It also includes storage for a variable DIFF[ ], repre- 
senting this difference. It further includes code for finding 
the feature detector 28[IMIN] whose preferred feature best 
matches DIFF[ ]. The part mapping signal 44 is imple- 
mented by storage and retrieval of the variables DIFF[ ] and 
WEIGHT[IM1N][ ]. The updater 42 implementation 
includes code for modifying WEIGHT[IMIN][ ] in the 
direction of DIFF[ ]. 
Architecture and Parameter Selection 

Certain aspects of the system architecture will be deter- 
mined by the problem to be solved. The number of input 
units (those in the bottom layer) will be determined by the 
input representation chosen. Recall that this representation is 
preferred to be 0/1 binary, but other than that, its composi- 
tion is left to the designer. The creation of an appropriate 
input representation is a common task in the prior art pattern 
recognition literature. 

This embodiment only has one layer of independent 
feature learning, which includes the weights to the feature 
detectors 28 from the input units (whose activities are 
communicated by the input signal 26); these weights 
embody the preferred features, and will be stored as the 
variable WEIGHTf ][ ]. However, the backprop network 
architecture may have multiple layers of connections. The 
considerations here will be the same as those in the prior art 
for backprop nets, with the extra consideration that the 
inputs to the backprop net will come from a data compres- 
sion layer. If anything, this may eliminate the need for one 
layer of a backprop net which would have been used without 
data compression; but preferably the backprop architecture 
should not be changed from what it would have been without 
data compression. 

The number of feature detectors 28 to use in the unsu- 
pervised layer — which corresponds to the number of input 
units in the backprop part of the system — is a parameter 
which will require experimentation to achieve an optimal 
value. This is characteristic of prior art devices as well, when 
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they have hidden units. Normally the number should be less 
than the number N of input units in the unsupervised 
network; otherwise, there is no data compression occurring. 
The first number tried should be the best guess at the number 
of independent features in the input domain. A typical 
method of experimentation is to start with a very small 
number of units in the layer, and increase the number after 
each training run, as long as performance of a trained system 
(on a cross-validation data set) improves and experimenta- 
tion time allows. 

The backprop layers may be constructed in accordance 
with any compatible feedforward backprop network to be 
found in the prior art (to be compatible, the backprop 
architecture must allow M real-valued inputs, of magnitude 
possibly greater than one). The inputs to the backprop 
network will be the transformed input signals 26, where the 
transformation uses the feature detectors 28; that is, the input 
to the backprop net will be the set of feature activity signal 
elements 30[w]. Note that for a typical backprop network, a 
target signal 46 will be required for each input signal 26, 
representing the desired output signal 36 for that input signal 
26. 

Some good backpropagation reference material, as well as 
references to further relevant background, may be found in 
the following sources: The Handbook of Brain Theory and 
Neural Networks (cited above); Introduction to the Theory of 
Neural Computation, by Hertz, Krogh, & Palmer (1991, 
Addison-Wesley, Redwood City, Calif.; and Neural Net- 
works for Pattern Recognition, by C. M. Bishop (1995, 
Oxford University Press, Oxford, G.B.). 

Numerous commercial software packages are available to 
assist in implementing backpropagation (and the rest of my 
preferred embodiment, in some cases) in computer software. 
One especially powerful and flexible one which is currently 
available for free (with certain copyright restrictions) is the 
PDP++ package by O'Reilly, Dawson, and McClelland. This 
package is available (at the time of this writing) from the 
Center for the Neural Basis of Cognition (a joint program 
between Carnegie Mellon University and The University of 
Pittsburgh) on the internet at http://www.cnbc.cmu.edu/ 
PDP++/PDP++.html (also at http://einstein.lerc.nasa.gov/ 
pdp++/pdp-user 13 toc.ntrnl). The documentation for this 
package is also very useful for learning about backpropa- 
gation and its implementation using object oriented pro- 
gramming and the C++ language. 
Regulation of Trials (Pattern Presentations) 

The overall operation of the second preferred embodiment 
is illustrated in FIG. 11. As with the first preferred 
embodiment, operation of this one can be viewed as a 
sequence of trials, each of which includes presentation of a 
single input signal 26. Preferably the trials are divided into 
a set of training trials (i.e., with only training mode enabled), 
followed by a set of recognition trials (with only recognition 
enabled). One possible confusion here is that "recognition" 
is taken to include all operations performed with the back- 
prop network — including training of the backprop net. Since 
this device does nothing special with respect to the backprop 
net, other than provide it with different inputs than it would 
otherwise have, backprop training will not be detailed, and 
is considered a "recognition" operation herein. I will spe- 
cifically call it "backprop training" to distinguish it from 
"training", when necessary; the latter is meant to refer only 
to training of the unsupervised feature detectors 28 of my 
device. 

Initialization Before Trials 

Before any trials occur, the memory 40 is loaded with the 
training set. The memory 40 is implemented as a two- 
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dimensional array variable MEMORY[ J ], where the first 
dimension ranges over patterns, and the second ranges over 
elements within a pattern. Note that MEMORY[ ][n] corre- 
sponds to INPUT[n]. 

Preferably all patterns in the training set are stored in the 
memory 40. However, if the training set is especially large, 
a non-preferred embodiment could be tried in which a 
random sample is chosen as the comparison patterns to store 
in the memory 40. If so, patterns in the sample should be 
chosen independently and at random according to their 
distribution in the pattern domain. 

The preferred features of the feature detectors 28 are 
implemented using the array WEIGHT! It I The first 
dimension of WEIGHT ranges over the M feature detectors 
28, and the second ranges over the N input units. Note this 
is the reverse of the WEIGHT indexing scheme of the first 
preferred embodiment, because this embodiment is more 
naturally viewed as a pattern interpreter than a pattern 
generator (although both embodiments can be viewed in 
either way). 

The weights must be initialized to small random values 
before any trials. Preferably they should be uniform random 
in the range 0.02 to 0.04, but this could be an experimental 
parameter if resources allow such experimentation. A pos- 
sible improvement, which I have not tested, is to set the 
weight vector WEIGHT[m][ ] for each feature detector 
28[m] to a small multiple of (e.g. 0.01 times) a randomly 
selected training pattern, and then add a small random 
number (e.g. between 0.02 and 0.03) to each weight (such 
that positive weights always result). Note that no range 
limitation is placed on the weights during learning as in the 
first preferred embodiment, although the learning procedure 
itself will maintain the weights within the 0 to 1 range. 
Training Trials 

The operation over training trials is illustrated in more 
detail in FIG. 12. On each training trial, a pattern is selected 
from the training set independently and at random according 
to the pattern domain distribution. Preferably this training 
pattern comes from the patterns stored in MEMORY. The 
training pattern is stored in the array TRAINPAI{ ]- 

Given a selected TRAINPAT, a loop is next performed 
over comparisons. Each comparison begins with the selec- 
tion of a random pattern from MEMORY, and storage of it 
in the array COMPAREPAT[ ]. 

A test is performed on TRAINPAT and COMPAREPAT to 
determine if they are identical on every binary element. If so, 
COMPAREPAT is marked as "used" for this trial, and 
processing moves to the next comparison. 

A second test determines whether TRAINPAT and COM- 
PAREPAT differ only by "noise"; that is, whether they are 
"essentially identical". The definition of "noise" generally 
depends on the problem at hand, so to achieve optimal 
performance this test could be experimented with. If experi- 
mentation is not possible, however, my preferred test should 
be used, which is to reject differences with a Hamming 
distance (number of differing bits) of one. The purpose of 
this test, (which purpose should guide any experimental 
changes), is to reject those differences which don't represent 
a true feature of the pattern domain. If TRAINPAT and 
COMPAREPAT are judged to differ only by noise, COM- 
PAREPAT is marked as "used" for this trial, and processing 
moves to the next comparison. 

Another test is next performed which attempts to restrict 
the comparisons to differences of one, or at most a small 
number, of features. It is called the "dissimilarity test", 
because the goal is to throw out comparison patterns which 
are highly dissimilar from the training pattern. Ideally only 
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pairs of patterns which differ by a single feature would be 
used, as these are best at indicating what the features of the 
pattern domain are. However, we can't identify the features 
a priori, so we can only use heuristics to guess at the number 

5 of differing features for a given pair of patterns. 

My preferred dissimilarity test is to reject comparisons 
which have a Hamming distance greater than some fixed 
percentage of the number N of input units. I recommend 
using a value of 20%, as shown in FIG. 12. However, if 

1Q experimental resources permit, a crude optimization of this 
value should be performed. (Note that the Hamming dis- 
tance used should never be less than or equal to that of the 
"essential identity" test, or else all comparisons would be 
rejected! Such excessive restriction must also be avoided if 
other, non-preferred tests are used.) This preferred value of 

15 20% assumes that the input patterns are not sparse — that is, 
that on average a roughly equal number of pattern elements 
are on as are off. If this is not true, the preferred value should 
be computed by determining the average number of "on" 
bits in a pattern, over the entire training set, and using 40% 

20 of that average number. 

It must be emphasized that this test will not be perfect, 
even with an optimized percentage. The problem is that a 
true feature could conceivably consist of a very large num- 
ber of input units. However, the alternative — a method 

25 which considers every non-identical pattern pair to differ by 
a single feature — is much less theoretically justified. Also, as 
always, if the system designer has some reason to believe 
that a particular value would be more appropriate for a given 
pattern domain than my suggested value, the designer's 

30 informed guess is preferred as the starting point for experi- 
mentation. 

Assuming COMPAREPAT passes the identity, near- 
identity, and dissimilarity tests, a difference vector is com- 
puted and stored as the variable DIFF[ ]. DIFF is obtained 

35 by the bit-wise operation AND -NOT For two boolean 
variables x and y, the value of x AND-NOT y is true (equals 
one) if and only if x is true and y is false. Thus, each element 
DIFF[n] is set to the value of TRAINPAT[n] AND-NOT 
COMPAREPAIln]. 

40 A loop is next entered over the M feature detectors 28. For 
each such detector m, a variable DIST is computed which is 
the Euclidean distance between WEIGHT[m][ ] and DIFF[ 
]. The minimum value of DIST over all feature detectors, 
and the index m corresponding to the minimum, are main- 

45 tained in MIN and IMIN, respectively. 

Once the minimum-distance feature detector 28[IMIN] is 
found, its preferred feature WElGHTtIMIN][ ] is moved 
toward the current difference vector DIFF[ ]. Note that DIFF 
represents a "part" of TRAINPAT for which feature detector 

50 28[IMIN] is taking responsibility. 

The amount of learning done on each comparison is 
determined by LRATE, the learning rate. LRATE is equal to 
1.0, times the reciprocal of the number of comparisons 
(including rejections) done on the trial (which equals 

55 NUMPATS, the number of training patterns, in my preferred 
embodiment), divided by ITRIAL, the index of the current 
trial (beginning with 1). For each element n of WEIGHT 
[IMINI ] and DIFF[ ], the difference DIFF[n]- WEIGHT 
[IMINJn] is computed, and multiplied by LRATE, and the 

60 result added to WEIGHT[MIN][n]. 

The comparison loop continues in this fashion until all 
comparison patterns have been exhausted. New comparison 
patterns are chosen without replacement, so that each one 
from the comparison set in MEMORY is used once and only 

65 once for each TRAINPAT. 

After all comparisons have been performed, and features 
updated, for this training pattern, a new training pattern is 
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selected. TRAINPATs, like COMPAREPATs, are not 
replaced in the pool once selected, so that each will be used 
once and only once, until all NUMPATS patterns have been 
used (at which point training may continue on a new cycle 
through the training patterns). 
Stopping the Learning Process 

A decision is made at some point to stop learning. My 
preferred method for this is to keep track, for each training 
pattern, of the number of comparisons on which each feature 
detector wins. That is, a 2-D array NUMW1NS[ I ] is 
maintained, where NUMWINS[mJt] is the number of times 
feature detector m won a comparison on trial t. The entire 
training set is presented repeatedly (as indicated by the 
"recycle set as necessary" instruction in FIG. 12), each 
iteration being as already described, until either (1) no 
element in the NUMWINS[ ][ ] array changes during the 
training set iteration, or (2) a maximum number of training 
set iterations is performed. The maximum could be experi- 
mented with, but my preferred value is 20. 

Note that while this procedure requires multiple iterations 
through the training set, the trial index ITRIAL should not 
be reset, since it represents the total number of training trials 
which have occurred. Another index variable should be used 
to keep track of pattern presentations within a given training 
set iteration. 

If experimental resources permit, it may be useful to try 
different criteria for stopping learning. This is especially true 
with large training sets, where learning may converge to an 
acceptable state within just one training set iteration. One 
such technique would be to maintain a running average for 
the MIN values (Euclidean distance between winning fea- 
ture detector IMIN and the DIFF vector it wins), and stop 
learning when a plot of this running average reaches some 
criterion (small) slope. 
Using the Backprop Network 

Once training is finished, training mode is disabled and 
recognition mode is enabled. At this point, the particular 
backprop architecture and procedure employed will deter- 
mine the order and manner in which patterns are selected. 
Recall that, as mentioned previously, training of the back- 
prop network will now take place, but because the backprop 
network is a well-known module with respect to my device, 
all operations on it including training will be considered 
"recognition mode" herein. 

All patterns in the training set must be converted for use 
by the backprop module, whether done all at once before 
training of the backprop net (as preferred, and as shown in 
FIG. 11), or one at a time during its training. Once it is 
trained, new patterns to be recognized must also be con- 
verted to allow proper recognition. 

The conversion of patterns may be viewed as an input 
layer feeding into the backprop net — albeit an input layer 
which (now) has fixed weights, and different activation 
functions than the backprop net. In the terminology of this 
specification, the feature activity signal 30 forms the input to 
the backprop module. Thus I describe here how to produce 
this signal 30 for this embodiment, and leave the implemen- 
tation of backprop up to the user. The considerations that go 
into the particular implementation of backprop used are the 
same as those in prior art backprop nets, except as noted 
herein. 

As shown in FIG. 11, the feature activity signal 30 is 
stored as an array ACTt ]> an d is determined as follows. The 
input pattern (signal 26) is stored in the array INPUTJ ]. The 
value for a given ACT[j] is computed as the inner product 
between the INPUT[ ] and WEIGHT[j][ ] vectors. (The 
inner product is also known as the dot product, and is a 
measure of the similarity of the two vectors.) 
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Note that the ACT values will be real numbers, and may 
fall beyond the range 0-1; in particular, they may range as 
high as the number of elements in the input, N. Such real 
valued inputs are not a problem in general for backpropa- 

5 gation networks. However, there are some specialized 
implementations of backprop which assume or prefer binary 
inputs or inputs which are less than or equal to one. Such 
implementations of backprop would not be appropriate for 
this preferred embodiment. 

10 The output of the backprop network, stored as the array 
OUTPUT^ ], becomes the output signal 36. The activation 
values of the backprop network's output units might be used 
directly, e.g. as posterior probability estimates, or a classi- 
fication index might be computed from them and used as the 

15 output signal 36 (in the latter case, OUTPUlt ] would only 
be a one-element array). The exact method used depends on 
what type of effector 38 is used, and on the recognition 
problem being addressed; an appropriate method will be 
readily apparent to those skilled in the art, given a particular 

2q recognition task. 

CONCLUSION, RAMIFICATIONS, AND SCOPE 
OF INVENTION 

Thus the reader will see that a pattern recognition device 
according to the invention may be trained with fewer 

25 examples of physical patterns than prior art devices applied 
to the same task. Furthermore, the invention allows for 
improved generalization of learning given a relatively small 
training set. Still further, it allows for potentially improved 
scaling to relatively large architectures. 

While my above description contains many specificities, 
these should not be construed as limitations on the scope of 
the invention, but rather as exemplifications of preferred 
embodiments thereof. Many other variations are possible. 

3J For example, neural network based embodiments could be 
used which are not strictly layered (i.e. have "layer skip- 
ping" connections), or which use some pattern of connec- 
tivity other than full connectivity between layers, such as 
limited receptive fields. 

4Q An embodiment similar to my first preferred embodiment 
might update weights simultaneously with Gibbs sampling; 
that is, each unit could be sampled and have its weights 
modified before moving on to another unit. More generally, 
a given feature detector 28[a] may be modified by the 

45 updater 42 before assignment of another part to another 
feature detector 28[fc] takes place (this is true for virtually 
any other embodiment as well, including my second pre- 
ferred embodiment). 

Many other variations of the invention will become 

50 apparent to those skilled in the art, especially upon observ- 
ing the relatively major differences between my two pre- 
ferred embodiments. 

Accordingly, the scope of the invention should be deter- 
mined not by the embodiments illustrated and described, but 

55 by the appended claims and their legal equivalents. 
What is claimed is: 

1. A device for recognizing and responding to physical 
patterns, comprising: 

(a) transducer means for producing an input signal rep- 
60 resenting a physical pattern in an environment; 

(b) a plurality of feature detectors responsive to said input 
signal, each feature detector having weight means for 
storing a representation of a preferred feature, for 
producing a feature activity signal representing degrees 

65 to which each of said preferred features exists in said 
input signal, and for producing a feature description 
signal representing said preferred features; 
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(c) classifier means responsive to said feature activity 
signal, for producing an output signal representing a 
system action corresponding to said input signal; 

(d) effector means responsive to said output signal, for 
committing an action in said environment; 

(e) memory means responsive to said input signal, for 
approximately storing a representation of said input 
signal, and for producing a retrieval signal representing 
previously stored input signals; 

(f) assigner means responsive to said input signal and to 
said retrieval signal and to said feature description 
signal, for producing a part mapping signal represent- 
ing a mapping between a plurality of parts and at least 
one responsible feature detector, such that each part 
corresponds to a likely feature of said input signal and 
of said previously stored input signals; 

(g) updater means responsive to said part mapping signal, 
for modifying each of said responsible feature detectors 
so as to make its preferred feature more similar to its 
assigned part; 

whereby the modification of each of said responsible feature 
detectors is largely independent of the modifications of the 
other feature detectors; 

whereby said device can be effectively trained with fewer 
physical pattern examples than a device having correlated 25 
feature training. 

2. The device of claim 1 wherein said part mapping signal 
represents a mapping between said plurality of parts and a 
plurality of responsible feature detectors, and is such that 
each responsible feature detector has a high correspondence 30 
to its assigned part relative to the other feature detectors. 

3. The device of claim 2 wherein said memory means is 
responsive to said feature activity signal and to said feature 
description signal, and said retrieval signal is dependent 
upon said feature activity signal and upon said feature 35 
description signal. 

4. The device of claim 3 wherein said feature detectors, 
and said classifier means, and said memory means, and said 
assigner means, and said updater means comprise execut- 
able instruction code on a digital computing machine. 40 

5. The device of claim 3 wherein said feature detectors are 
implemented with a neural network, such that the weight 
means for each feature detector comprises an array of 
modifiable connections configured for receiving said input 
signal, 

6. The device of claim 5 wherein said neural network 
comprises executable instruction code on a digital comput- 
ing machine. 

7. The device of claim 5 wherein at least one unit of said 
neural network acts according to a noisy-OR function. 

8. The device of claim 7, further including means for 
storing a contribution to the activation probability of said at 
least one unit, such that said contribution may be accessed 
on a plurality of activation cycles. 

9. The device of claim 8 wherein said contribution is a 
sum over each inactive child unit of a negative logarithm of 
a quantity representing one minus the weight from said at 
least one unit to said inactive child unit. 

10. The device of claim 5 wherein said assigner means is 
configured to perform a soft segmentation of said input 
signal to obtain said parts. 

11. The device of claim 2 wherein said memory means is 
a lossless storage device. 

12. The device of claim 11 wherein each of said parts is 
a difference vector representing a difference between said 
input signal and a previously stored comparison pattern 
represented by said retrieval signal. 



45 



50 



55 



60 



65 



13. The device of claim 12 wherein said assigner means 
is configured to assign each of said parts to a winning feature 
detector, said winning feature detector having the preferred 
feature which has a minimum distance from said difference 
vector. 

14. The device of claim 2 wherein said updater means is 
configured to modify each of said responsible feature detec- 
tors so as to make its preferred feature move to a new input 
space location which is substantially along the vector from 
its current input space location to the input space location of 
its assigned part. 

15. A method for creating a pattern recognition device, 
comprising the steps of: 

(a) providing transducer means for producing an input 
signal representing a physical pattern in an environ- 
ment; 

(b) providing a plurality of feature detectors responsive to 
said input signal, each feature detector having weight 
means for storing a representation of a preferred 
feature, for producing a feature activity signal repre- 
senting degrees to which each of said preferred features 
exists in said input signal, and for producing a feature 
description signal representing said preferred features; 

(c) providing classifier means responsive to said feature 
activity signal, for producing an output signal repre- 
senting a system action corresponding to said input 
signal; 

(d) providing effector means responsive to said output 
signal, for committing an action in said environment; 

(e) providing memory means for approximately storing 
input patterns, and for producing a retrieval signal 
representing previously stored input patterns; 

(f) using said memory meaas to approximately store a 
sequence of comparison patterns; 

(g) providing a training pattern; 

(h) identifying a plurality of parts in said training pattern, 
such that each part corresponds to a likely feature of 
said training pattern and of said comparison patterns; 

(i) assigning each of said parts to a corresponding respon- 
sible feature detector; 

(j) modifying each of said responsible feature detectors so 
as to make its preferred feature substantially directly 
more similar to its assigned part; 
(k) training said feature detectors by repeating steps (g) 
through (j) on a significant portion of a training set until 
a training criterion is reached; 
whereby the modification of each of said responsible feature 
detectors is largely independent of the modifications of the 
other feature detectors; 

whereby said method allows effective creation of a pattern 
recognition device with fewer pattern presentations than a 
device having correlated feature training. 

16. The method of claim 15, further including the steps of: 
(1) repeating steps (a) through (d) to create a comparable 

pattern recognition device; 
(m) transferring the preferred feature of at least one of the 
trained feature detectors to at least one corresponding 
feature detector of said comparable pattern recognition 
device. 

17. The method of claim 15 wherein said memory means 
is responsive to said feature activity signal and to said 
feature description signal, and said retrieval signal is depen- 
dent upon said feature activity signal and upon said feature 
description signal. 

18. The method of claim 17 wherein said feature detectors 
are implemented with a neural network, such that the weight 



11/02/2003, EAST Version: 1.4.1 



6,058,206 



33 



34 



means for each feature detector comprises an array of 
modifiable connections configured for receiving said input 
signal. 

19. The method of claim 18 wherein at least one unit of 
said neural network acts according to a noisy-OR function. 

20. A device for recognizing and responding to physical 
patterns, comprising: 

(a) a transducer capable of producing an input signal 
representing a physical pattern in an environment; 

(b) a plurality of feature detectors each responsive to said 
input signal, each feature detector having weight stor- 
age capable of representing a preferred feature, each of 
said feature detectors being capable of producing a 
feature activity signal element representing a degree to 
which its preferred feature exists in said input signal, 
and being capable of producing a feature description 
signal element representing its preferred feature; 

(c) a classifier responsive to each said feature activity 
signal element, capable of producing an output signal 
representing a system action corresponding to said 
input signal; 

(d) an effector responsive to said output signal, capable of 
committing an action in said environment; 

(e) a memory responsive to said input signal, capable of 25 
approximately storing a representation of said input 
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signal, and capable of producing a retrieval signal 
representing previously stored input signals; 

(f) an assigner responsive to said input signal and to said 
retrieval signal and to each of said feature description 
signal elements, capable of producing a part mapping 
signal representing a mapping between a plurality of 
parts and a plurality of responsible feature detectors, 
such that each part corresponds to a likely feature of 
said input signal and of said previously stored input 
signals, and such that each responsible feature detector 
has a high correspondence to its assigned part relative 
to the other feature detectors; 

(g) an updater responsive to said part mapping signal, 
capable of modifying each of said responsible feature 
detectors so as to make its preferred feature vector 
move substantially directly toward its assigned part 
vector, 

whereby the modification of each of said responsible feature 
detectors is largely independent of the modifications of the 
other feature detectors; 

whereby said device can be effectively trained with fewer 
physical pattern examples than a device having correlated 
feature training. 
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